Hi,
I am trying to load data from DBpedia. Since there is no fixed schema defined in DBpedia, I tried to parse the dataset and generate the schema by myself.
Since there is no clear classification for vertex, I only generate one vertex type which leads to a large number of properties it contained, i.e.3726. And there are also
lots of edge types generated, i.e. 491 edge types. Then I tried to load the data into TigerGraph and the schema is successfully defined, but the loading job CANNOT
create normally. The loading job is as follows:
USE GRAPH dbpedia
CREATE LOADING JOB load_dbpedia FOR GRAPH dbpedia {
// define vertex
DEFINE FILENAME v_link_file;
// define edge
DEFINE FILENAME accessdate_file;
DEFINE FILENAME accessDate_file;
DEFINE FILENAME action_file;
DEFINE FILENAME agency_file;
DEFINE FILENAME agg1_file;
... // Totally 491 definitions
// Load Vertex
LOAD v_link_file
TO VERTEX Link VALUES ($0, $0, $1, ..., $3724) USING header="false", separator="|";
// Load Edge
LOAD accessdate_file TO EDGE accessdate VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD accessDate_file TO EDGE accessDate VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD action_file TO EDGE action VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD agency_file TO EDGE agency VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD agg1_file TO EDGE agg1 VALUES ($0, $1, $2) USING header="false", separator="|";
... // Totally 491 LOADs
}
I use ‘gsql dbpedia_load.gsql’ to create the job with the output
[aaron@w23 queries]$ gsql dbpedia_load.gsql
Using graph 'dbpedia'
null
I have several questions
-
Are there any hints about why this situation happens?
-
How to load data without pre-defined schema? Is there any API that could automatically do this?
My Setting:
-
TigerGraph Developer Version.
-
CentOS 6.8
-
128G Memory
Thanks!
Sorry for the inconvenience. Could you please provide your TIgerGraph Version ( gadmin version ) and the GSQL_LOG? ( gadmin log gsql )
All data has to be ingested by following the schema defined, but a loading job is optional by the following method:
https://docs.tigergraph.com/dev/restpp-api/built-in-endpoints#post-graph-graph_name-upsert-the-given-data
Thanks.
Hi Xinyu,
Thank you for your reply! I have checked my TigerGraph version, which is tg_2.4.0_dev. And the gsql_log is attached below (I skipped part of loading job for better reading).
And it’s clear that the problem is because of JVM OOM. Then how should I config it?
Thanks!
GSQL Shell log
2019-10-23 18:24:12.265
2.4,tg_2.4.0_dev,f422d403dab4db5c96bd22822e79e7fd5a581283,f6b4892ad3be8e805d49ffd05ee2bc7e7be10dff
DEVELOPER_EDITION
PATH:/home/aaron/.venv/bin:/home/aaron/.syspre/usr/bin:/home/aaron/.syspre/opt/rh/devtoolset-2/root/usr/bin:/home/aaron/.syspre/bin:/home/aaron/.syspre/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-2.el6_10.x86_64/jre/bin:/data/cghuan/anaconda3/bin:/home/aaron/.gium:/data/aaron/rust/bin:/data/aaron/curl-7.61.1/src:/data/aaron/wukong/deps/hwloc-1.11.7-install/bin:/data/opt/brew/bin:/data/opt/brew/sbin:/data/opt/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/data/opt/hadoop-2.6.0/bin:/usr/local/scala-2.10.4/bin:/data/opt/hadoop-2.6.0/hive/bin:/data/opt/hadoop-2.6.0/hbase/bin:/data/opt/spark-2.2.0/bin:/data/opt/flink-0.9.1/bin:/home/aaron/.gium/:/home/aaron/bin:/home/aaron/.gium/
BuildTime: Tue Jun 11 10:53:31 PDT 2019
I@20191023 18:24:12.407 (Util.java:262) gcc path: /home/aaron/.syspre/opt/rh/devtoolset-2/root/usr/bin/g++
I@20191023 18:24:12.412 (Util.java:2064) /home/aaron/tigergraph/.license/lic_endpoint exists false
I@20191023 18:24:12.415 (Util.java:2091) license expires at 01/01/2266
I@20191023 18:24:12.415 (Util.java:2101) lic: 9340876800 cur: 1571826252
I@20191023 18:24:12.419 (AdminServiceClient.java:127) establish connection w/ admin server
I@20191023 18:24:12.468 (AdminServiceClient.java:158) connected to admin server
I@20191023 18:24:12.503 (ZkClient.java:80) Connected to zk server.
I@20191023 18:24:13.041 (ZkClient.java:80) Connected to zk server.
I@20191023 18:24:21.019 (ZkClient.java:80) Connected to zk server.
I@20191023 18:24:24.293 (Driver.java:145) START SERVER!
I@20191023 18:24:25.436 (BaseHandler.java:16) BaseHandler: A
I@20191023 18:24:25.438 tigergraph|127.0.0.1:48436| (VersionHandler.java:34) v
I@20191023 18:26:43.452 (BaseHandler.java:16) BaseHandler: o
I@20191023 18:26:43.453 tigergraph|127.0.0.1:56584|249641201 (Util.java:314) GSHELL_TEST is empty
I@20191023 18:26:43.453 tigergraph|127.0.0.1:56584|249641201 (Util.java:321) session parameter graph is empty
I@20191023 18:26:43.453 tigergraph|127.0.0.1:56584|249641201 (Util.java:335) COMPILE_THREADS is empty
I@20191023 18:26:43.453 tigergraph|127.0.0.1:56584|249641201 (Util.java:349) FromGraphStudio is empty
I@20191023 18:26:43.454 tigergraph|127.0.0.1:56584|249641201 (LoginHandler.java:62) The gsql client is started on the server, and the working directory is /data/aaron/tigergraph/queries
I@20191023 18:26:43.455 tigergraph|127.0.0.1:56584|249641201 (Util.java:2101) lic: 9340876800 cur: 1571826403
I@20191023 18:26:43.455 tigergraph|127.0.0.1:56584|249641201 (LoginHandler.java:89) tigergraph login successfully.
I@20191023 18:26:43.722 tigergraph|127.0.0.1:56584|249641201 (BaseHandler.java:16) BaseHandler: i
I@20191023 18:26:43.722 tigergraph|127.0.0.1:56584|249641201 (Util.java:314) GSHELL_TEST is empty
I@20191023 18:26:43.722 tigergraph|127.0.0.1:56584|249641201 (Util.java:321) session parameter graph is empty
I@20191023 18:26:43.927 tigergraph|127.0.0.1:56584|249641201 (Util.java:335) COMPILE_THREADS is empty
I@20191023 18:26:43.927 tigergraph|127.0.0.1:56584|249641201 (Util.java:349) FromGraphStudio is empty
I@20191023 18:26:43.927 tigergraph|127.0.0.1:56584|249641201 (Util.java:291) New session parameters.
I@20191023 18:26:43.943 tigergraph|127.0.0.1:56584|249641201 (FileHandler.java:45) USE GRAPH dbpedia
DROP JOB load_dbpedia
CREATE LOADING JOB load_dbpedia FOR GRAPH dbpedia {
// define vertex
DEFINE FILENAME v_link_file;
// define edge
DEFINE FILENAME accessdate_file;
DEFINE FILENAME accessDate_file;
DEFINE FILENAME action_file;
DEFINE FILENAME agency_file;
DEFINE FILENAME agg1_file;
...
// load vertex
LOAD v_link_file
TO VERTEX Link VALUES ($0, $0, $1, $2, $3, ..., $3724) USING header="false", separator="|";
// load edge
LOAD accessdate_file
TO EDGE accessdate VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD accessDate_file
TO EDGE accessDate VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD action_file
TO EDGE action VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD agency_file
TO EDGE agency VALUES ($0, $1, $2) USING header="false", separator="|";
LOAD agg1_file
TO EDGE agg1 VALUES ($0, $1, $2) USING header="false", separator="|";
...
}
I@20191023 18:26:43.976 tigergraph|127.0.0.1:56584|249641201 (CatalogLock.java:223) Lock: Short && Read
I@20191023 18:26:53.346 tigergraph|127.0.0.1:56584|249641201 (CatalogManager.java:811) getCatalog: null
I@20191023 18:26:53.346 tigergraph|127.0.0.1:56584|249641201 (CatalogManager.java:768) switch to graph 'dbpedia'.
I@20191023 18:26:53.347 tigergraph|127.0.0.1:56584|249641201 (CatalogLock.java:273) Unlock: Short && Read
I@20191023 18:26:53.348 tigergraph|127.0.0.1:56584|249641201 (CatalogLock.java:208) Lock: Short && Write
I@20191023 18:27:01.965 tigergraph|127.0.0.1:56584|249641201 (CatalogManager.java:811) getCatalog: dbpedia
I@20191023 18:27:01.968 tigergraph|127.0.0.1:56584|249641201 (CatalogLock.java:270) Unlock: Short && Write
I@20191023 18:27:01.969 tigergraph|127.0.0.1:56584|249641201 (CatalogLock.java:208) Lock: Short && Write
I@20191023 18:27:10.486 tigergraph|127.0.0.1:56584|249641201 (CatalogManager.java:811) getCatalog: dbpedia
I@20191023 18:27:24.430 tigergraph|127.0.0.1:56584|249641201 (Util.java:1878) Run bash command: rm -f /home/aaron/tigergraph/dev/gdk/gsql/.catalog/0/codegen//GSQL_UDT.cpp
I@20191023 18:27:24.435 tigergraph|127.0.0.1:56584|249641201 (Util.java:1894) Finished
find token udf source files, compile if any one is newer than .so
skip token.so
I@20191023 18:27:24.499 tigergraph|127.0.0.1:56584|249641201 (AdminServiceClient.java:231) send /home/aaron/tigergraph/dev/gdk/gsql/.tmp/TokenBank.so to /home/aaron/tigergraph/bin
E@20191023 18:31:12.558 tigergraph|127.0.0.1:56584|249641201 (QueryBlockHandler.java:132) null
java.lang.OutOfMemoryError
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at java.io.StringWriter.write(Unknown Source)
at com.google.gson.stream.JsonWriter.newline(JsonWriter.java:603)
at com.google.gson.stream.JsonWriter.beforeName(JsonWriter.java:618)
at com.google.gson.stream.JsonWriter.writeDeferredName(JsonWriter.java:401)
at com.google.gson.stream.JsonWriter.value(JsonWriter.java:480)
at com.google.gson.internal.bind.TypeAdapters$3.write(TypeAdapters.java:148)
at com.google.gson.internal.bind.TypeAdapters$3.write(TypeAdapters.java:133)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:127)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:245)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:97)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:127)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:245)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:97)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:127)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:245)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:97)
at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.write(CollectionTypeAdapterFactory.java:61)
at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.write(TypeAdapterRuntimeTypeWrapper.java:69)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.write(ReflectiveTypeAdapterFactory.java:127)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.write(ReflectiveTypeAdapterFactory.java:245)
at com.google.gson.Gson.toJson(Gson.java:704)
at com.google.gson.Gson.toJson(Gson.java:683)
at com.google.gson.Gson.toJson(Gson.java:658)
at com.tigergraph.schema.Catalog.e(Catalog.java:12431)
at com.tigergraph.schema.Catalog.iy(Catalog.java:15125)
at com.tigergraph.schema.Catalog.k(Catalog.java:15203)
at com.tigergraph.schema.Catalog.iz(Catalog.java:15195)
at com.tigergraph.schema.Catalog.l(Catalog.java:10546)
at com.tigergraph.schema.b.p.a(QueryBlockHandler.java:544)
at com.tigergraph.schema.b.p.a(QueryBlockHandler.java:179)
at com.tigergraph.schema.b.p.a(QueryBlockHandler.java:122)
at com.tigergraph.schema.b.i.a(FileHandler.java:50)
at com.tigergraph.schema.b.c.handle(BaseHandler.java:33)
at com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at sun.net.httpserver.AuthFilter.doFilter(Unknown Source)
at com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(Unknown Source)
at com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source)
at sun.net.httpserver.ServerImpl$Exchange.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
I@20191023 18:31:12.561 tigergraph|127.0.0.1:56584|249641201 (CatalogLock.java:270) Unlock: Short && Write
I@20191023 18:31:12.585 tigergraph|127.0.0.1:56584|249641201 (BaseHandler.java:16) BaseHandler: a
I@20191023 18:31:12.586 tigergraph|127.0.0.1:56584|249641201 (AbortClientSessionHandler.java:29) AbortSession added for session = 249641201
I@20191023 18:31:12.586 tigergraph|127.0.0.1:56584|249641201 (AbortClientSessionHandler.java:30) AbortLoadingProgress added for session = 249641201
This is a known issue when loading a job is too big.
Could you either reduce the schema size or split the loading job into many small jobs?