The GPE crashed when doing back up

When I executed below command for local backup. My version is docker image 3.9.3.

gadmin backup create production

The GPE crashed, here’s the error log:

Log file created at: 2023/11/12 16:01:32
Running on machine: ip-172-31-27-249
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E1112 16:01:32.251708 10901 gpedaemon.cpp:86] Running as a leader |time:16:01:32.251812
E1112 16:01:36.778251 13615 gthreadpool.cpp:122] Error: boost::filesystem::remove: Directory not empty: "/home/tigergraph/tigergraph/data/gstore/0/part//.mv/174/1699544446270/174_SI"
E1112 16:01:38.905006 13615 gthreadpool.cpp:122] Error: boost::filesystem::remove: Directory not empty: "/home/tigergraph/tigergraph/data/gstore/0/part//.mv/146/1699550201396/146_SI"
E1112 16:01:42.121706 13615 gthreadpool.cpp:122] Error: boost::filesystem::remove: Directory not empty: "/home/tigergraph/tigergraph/data/gstore/0/part//.mv/101/1699637312348/101_SI"

And the GPE crash again after it is started.

And here’s the error info for backup.

[ Error] InternalError (backup failed; missing the good backup(snapshot) response for the service GPE partition 1, error: InternalError (GPE_1#1:Precheck failed because there are still segments in rebuilding after 10min, pls retry later.))

And after restarted all, a simple gsql always return this error:

{“error”:true,“message”:"The query didn’t finish because it exceeded the query timeout threshold (16 seconds). Please check GSE log for license expiration and RESTPP/GPE log with request id (131086.RESTPP_1_1.1699814970056.N) for details. Try increase RESTPP.Factory.DefaultQueryTimeoutSec or add header GSQL-TIMEOUT to override default system timeout. ",“results”:[],“code”:“REST-3002”}

And the GPE service shows warmup. recurring

Hi @linb, I’ve been trying to replicate your backup issue, but haven’t been successful. This might need the Technical Support team to help look into your logs to pinpoint the issue for the backup You can put in a formal request here: Zendesk Auth

In the latest error message it mentioned “license expiration”. Can you confirm that is up to date? (gadmin license status)

Another common issue with services failing can be due to hitting limits on hardware. This guide can walk through a few basic troubleshooting steps to ensure it’s not one of those common issues. Troubleshooting Guide :: TigerGraph Server