Error loading file

Jon_Herke · April 8, 2020, 5:11pm

Hello,

I created a graph and then a loading job. The loading job starts to run fine but it gets slower over time and then fails. I have attached the log file.

Tigergraph is running on a linux instance with 32gb of RAM, 16 cores and it’s installed on a separate disk of 500 GB.

After the failure, the job successfully loaded +520 million nodes (167 GB). The job only loads a type of vertex in the graph.

Here is the job definition:

USE GRAPH Morpheus

BEGIN

CREATE LOADING JOB load_morpheus_events FOR GRAPH Morpheus {

   DEFINE FILENAME events="/neo4j/morpheus/event_nodes.txt";

   LOAD events TO VERTEX event VALUES ($0, $1, $2, $3, $4, $5) USING header="false", separator="\t";

}

END

Are there some parameters that can/must be changed?

The same file loads ok in neo4j in the same linux instance so I’m not sure what can be the issue.

Thank you!

Jon_Herke · April 8, 2020, 5:12pm

Hi Camilo,

It seems that the engine timeout the loading request. Can you check the status of the system by “gadmin status”and check disk space of where the TigerGraph is installed. Please also send us the logs of $logs/GPE_1_1/log.INFO and $logs/RESTPP-LOADER_1_1/log.INFO.

Best,

Dan

Jon_Herke · April 8, 2020, 5:12pm

Hi Dan,

Here are the results of the commands. The log files are here, I wasn’t able to upload them here.

$ gadmin status

Welcome to TigerGraph Developer Edition, for non-commercial use only.

=== zk ===

[SUMMARY][ZK] process is up

[SUMMARY][ZK] /tiger/tigergraph/zk is ready

=== kafka ===

[SUMMARY][KAFKA] process is up

[SUMMARY][KAFKA] queue is ready

=== gse ===

[SUMMARY][GSE] process is up

[SUMMARY][GSE] id service is ready (online)

=== dict ===

[SUMMARY][DICT] process is up

[SUMMARY][DICT] dict server is ready

=== ts3 ===

[SUMMARY][TS3] process is up

[SUMMARY][TS3] ts3 is ready

=== graph ===

[SUMMARY][GRAPH] graph is ready

=== nginx ===

[SUMMARY][NGINX] process is up

[SUMMARY][NGINX] nginx is ready

=== restpp ===

[SUMMARY][RESTPP] process is up

[SUMMARY][RESTPP] restpp is ready

=== gpe ===

[SUMMARY][GPE] process is up

[SUMMARY][GPE] graph is ready (online)

=== gsql ===

[SUMMARY][GSQL] process is up

[SUMMARY][GSQL] gsql is ready

=== Visualization ===

[SUMMARY][VIS] process is up (VIS server PID: 119541)

[SUMMARY][VIS] gui server is up

$ df -h

Filesystem                   Size  Used Avail Use% Mounted on

/dev/sdd1                    493G  167G  301G  36% /tiger

Thank you.

Jon_Herke · April 8, 2020, 5:13pm

Hi Camilo,

In the developer version, Tigergraph system need to have enough memory to hold all vertices/edges in memory. According to the logs, the memory reaches to its limit in the engine so that engine put most of its job into sleep mode and hold all the ongoing loading requests until all previous loading request have been processed, which finally timeout these following requests and leads to the failure of loading. You can double check the system memory of each tigergraph components by “gadmin status -v” or by checking memory of system via “free -g”. Usually, in order to get the best performance, we suggest customer to give system memory at least twice of the disk space used by our engine storage. To check the disk usage of the engine store, you can use “gadmin status -v graph”.

Here is the summary of the logs, i.e.

RESTPP-LOADER LOG

Dans-MacBook-Pro-2:Downloads danhu$ grep -rn "98631:RESTPP-LOADER_1_1:1533613116574" RESTPP-LOADER_1_1_log.INFO

RESTPP-LOADER_1_1_log.INFO:1187773:I0806 20:38:36.575000 34978 handler.cpp:225] Engine_req|RawRequest|98631:RESTPP-LOADER_1_1:1533613116574|POST|url = restpp/onlineloader?|payload_data.size() = 183017|api = v2

RESTPP-LOADER_1_1_log.INFO:1188719:I0806 20:48:36.815372 34970 dispatcher.cpp:1175] Comp_Dispatcher|20:48:36.815393|98631:RESTPP-LOADER_1_1:1533613116574: Code 1,  num_expectresults_ 1,  holder result  1,  ready result 1: 0,$1$0,  ready result 2: 0

RESTPP-LOADER_1_1_log.INFO:1188720:I0806 20:48:36.869263 34979 worker.cpp:1070] Engine_req|OnError|98631:RESTPP-LOADER_1_1:1533613116574|gle|dispatcher|dispatcher timeout

RESTPP-LOADER_1_1_log.INFO:1188721:I0806 20:48:36.904887 34979 requestrecord.cpp:221] Engine_req|ReturnResult|98631:RESTPP-LOADER_1_1:1533613116574|214

This is the relevant logs for the first aborted loading request “98631:RESTPP-LOADER_1_1:1533613116574” inside the RESTPP-LOADER, which shows that LOADER start the request at 20:38:36 but it timeouts at 20:48:36, this is expected because our default system timeout is 10 minutes.

GPE LOG

However on the engine side, around the same time period the system memory is already not healthy and the same request was immediately put on hold when it received by engine, i.e.

I 0806 20:43:42.424974 19471 globalmemoryallocator.cpp:265] Comp_Memory|/tiger/tigergraph/gstore/0/part//.mv/661/1533613421358/_vertexpack force disk mode because System Memory not Health

I0806 20:43:42.425074 19470 globalmemoryallocator.cpp:265] Comp_Memory|/tiger/tigergraph/gstore/0/part//.mv/661/1533613421358/edgelist.bin force disk mode because System Memory not Health

I0806 20:43:42.425150 19469 globalmemoryallocator.cpp:265] Comp_Memory|/tiger/tigergraph/gstore/0/part//.mv/661/1533613421358/vertex.bin force disk mode because System Memory not Health

I0806 20:43:42.427094 19494 post_listener.cpp:138] Request|gle, **98631:RESTPP-LOADER_1_1:1533613116574** :NNAC,294,,0|Post_listener|getDeltaMessage|552717|552716|56204

I0806 20:43:42.432273 19492 gtimer.cpp:134] (11.679 ms) Rebuild _661_ RunOneSegment reload done

I0806 20:43:42.433750 19496 enginejobrunner.cpp:450] Engine_PullDelta|Pull|154600|552717|552717|1ms

I0806 20:43:42.434532 19496 enginejobrunner.cpp:494] Engine_PullDelta|System Memory not health. Sleeping.

I0806 20:43:42.437981 19492 gtimer.cpp:134] (5.717 ms) Rebuild _661_ RunOneSegment switch done

I0806 20:43:42.438215 19492 gtimer.cpp:134] (0.141 ms) DeltaSegmentRecords|Rebuild Compact_Until 661 cleaned_attr_count 145 remain_att_count 23 cleaned_edge_count 0 remain_edge_count 0

Best Wishes,

Dan