Bug - Incomprehensible application behavior

marcinw · July 1, 2020, 6:30am

Hello @mbatz,
Today I encountered a strange situation in the operation of the application.
I have 2 instances of applications on different servers, the situation occurred on both instances, although it showed differently.
Both instances went from 17/06/2020 and I will add that nothing has been put off in the application logs, either by warnings or errors.

Namely:
1 instance - the application does not go to the tabs:
Categories, Import Type, Export Type, Import Objects, Export Objects, System Information, Authentication, Database Properties, Rights,

2 instance - after the above mentioned also Exportd Job Logs, Users, Groups, Profile

When I clicked on these tabs in the application nothing appeared in the logs, as if the application stopped working
However, when I clicked on the Types tab or walked on objects, it worked
I also tried to add a rigid context, e.g. /framework/category, and this also did not help, after reloading the page returned to the empty and basic link ip_address: 4000 in both instances

I tried on chrome , and firefox and results was the same

After restart everything (mongod, rabbitmq and datagerry) on one instance the application started working again correctly on this instance

mbatz · July 7, 2020, 3:49pm

Hi @marcinw,

can you reproduce that behavior or was that a single case? As I can see with port 4000 in the screenshot, do you access the DATAGERRY webserver directly? We recommend running the application behind an Nginx proxy server for performance reasons.

marcinw · July 8, 2020, 5:49am

Hi,
So far, this has only happened once.
Yes one instance work behind Nginx and second work without Nginx and it happened on both instances
This ip_address:4000 i wrote because i can’t show you my addresses which i use.
I can’t reproduce this situation. Maybe if i don’t restart application during 14 days maybe then we could reproduce that behavior or maybe someone else could reproduce that behavior Datagerry or something like that happened to him?

mbatz · July 8, 2020, 2:58pm

Hi,

just let me know when this error will appear again. I haven’ t seen this in one of our setups yet, but if we see that error again, we need to pay attention on that.

marcinw · July 9, 2020, 5:23am

Hi,
Ofcourse i’ll let you know when it happened.

marcinw · November 13, 2020, 11:07am

Hello @mbatz,
according to what we agreed, when the problem reappeared, I am addressing the topic.
Today I wanted to check the logs in page Object logs from tab “Object Logs”

and i can’t go there.
When i clicked this button “Object Logs” menu hidde and nothing more. I’m still in the same place where i was. I can’t go to logs.
So i checked other tabs and i can’t go also to pages from tabs:
System Information
Authentication
Database Properties
Rights

Export Job Settings
Export Job Logs
Document Templates

All tab serving for export/import object and types

I only can move to:
Users
Groups
and all in Framework tab
and view/edit objects

my application works from 30.10.2020:

MainProcess.log

[2020-10-30 06:50:16][INFO ] — DATAGERRY starting… (main.py)
[2020-10-30 06:50:16][INFO ] — Checking database connection with cmdb.conf data (main.py)
[2020-10-30 06:50:16][INFO ] — Database connection established. (main.py)
[2020-10-30 06:50:16][INFO ] — CHECK ROUTINE: STARTED… (check.py)
[2020-10-30 06:50:16][INFO ] — CHECK ROUTINE: Checking database collection validation (check.py)
[2020-10-30 06:50:17][INFO ] — CHECK ROUTINE: Database collection validation status True (check.py)
[2020-10-30 06:50:17][INFO ] — CHECK ROUTINE: FINISHED! (check.py)
[2020-10-30 06:50:17][INFO ] — SETUP ROUTINE: STARTED… (setup.py)
[2020-10-30 06:50:17][INFO ] — SETUP ROUTINE: Checking database connection (setup.py)
[2020-10-30 06:50:17][INFO ] — SETUP ROUTINE: Database connection status True (setup.py)
[2020-10-30 06:50:17][INFO ] — SETUP ROUTINE: FINISHED! (setup.py)
[2020-10-30 06:50:17][INFO ] — Process manager started: True (main.py)
[2020-10-30 06:50:18][INFO ] — DATAGERRY successfully started (main.py)

Now i have these files in system

Interesrting is that webserver.access.log is not a file where writed newest log but webserver.access.log.4

in webserver.error.log i don’t have anything bad. Only this:

I also check mongod status and rabbitmq but it works and i don’t see any notice error.

It is a strange coincidence that both the topic case and this case stop working after 14 days of application work.

marcinw · November 13, 2020, 12:43pm

Everything i testing in chrome.
In other web browser don’t work too.
I also can writed that when i clicked on works tabs then i saw that information in log but when i clicked on tabs which don’t work i don’t had any information in log.

mbatz · November 16, 2020, 11:24am

Hi @marcinw,

thanks for your detailed report. Did a restart of DATAGERRY solve the problem? It seems, in some rare cases (currently I have not seen this in any of our customer setups), the backend will not answer anymore to HTTP queries. As I could not reproduce it in my testing environment, this is hard to debug. I will discuss the options with the team.

marcinw · November 16, 2020, 11:34am

Always when i restarting datagerry it works again ok.
In my case on two instances that application works the same -> after 14 days can’t works normally with every tabs.
These instances works on virtual machines on which is os Centos 7 and Centos Stream (now 8 major version).
These instances works not on docker have 4gb and 8gb ram, 2 and 4 vcpu
one from these instances works with nginx. On these virtual machines works only these applications.

marcinw · November 30, 2020, 8:06am

Hello,
Today it happened again exactly after 14 days that I could not enter different tabs.
I wanted to check the logs, but there were only 3 files:
exportd.log
webapp.log
webserver.access.log
Only restart help me

mbatz · December 4, 2020, 1:58pm

Hi @marcinw,

yesterday I did some analysis on this, as I saw such a behavior in my development environment. DATAGERRY was started in foreground and I did some resizing of the SSH terminal window. This caused the signal “SIGWINCH” to be send. The internal webserver we use for the DATAGERRY backend, gunicorn, will handle that signal by shutting down its worker processes. This was implemented for a specific use case. After that, the DATAGERRY webserver was not responding anymore. This only happened when starting DATAGERRY in foreground and I could not reproduce that behavior in any other setting (Docker, running as daemon in background, running in background). So that should not be the reason for the issues in your setup. But what we could see, were logentries in the webserver.error.log. Everytime a worker was closed or anything happened there, there was a logentry created by gunicorn. In our example yesterday, we saw the following logs:

[2020-12-03 15:36:23][INFO ] --- Handling signal: winch ([glogging.py](http://glogging.py/))
[2020-12-03 15:36:23][INFO ] --- graceful stop of workers ([glogging.py](http://glogging.py/))
[2020-12-03 15:36:23][INFO ] --- Worker exiting (pid: 27686) ([glogging.py](http://glogging.py/))
[2020-12-03 15:36:23][INFO ] --- Worker exiting (pid: 27688) ([glogging.py](http://glogging.py/))
[2020-12-03 15:36:23][INFO ] --- Worker exiting (pid: 27690) ([glogging.py](http://glogging.py/))
[2020-12-03 16:42:23][INFO ] --- Handling signal: winch ([glogging.py](http://glogging.py/))
[2020-12-03 16:42:23][INFO ] --- graceful stop of workers ([glogging.py](http://glogging.py/))
[2020-12-03 16:42:23][INFO ] --- Handling signal: winch ([glogging.py](http://glogging.py/))
[2020-12-03 16:42:23][INFO ] --- graceful stop of workers ([glogging.py](http://glogging.py/))
[2020-12-03 16:42:23][INFO ] --- Handling signal: winch ([glogging.py](http://glogging.py/))
[2020-12-03 16:42:23][INFO ] --- graceful stop of workers ([glogging.py](http://glogging.py/))
[2020-12-03 16:43:30][INFO ] --- Handling signal: winch ([glogging.py](http://glogging.py/))
[2020-12-03 16:43:30][INFO ] --- graceful stop of workers ([glogging.py](http://glogging.py/)) 
[2020-12-03 16:44:17][INFO ] --- Handling signal: winch ([glogging.py](http://glogging.py/))
[2020-12-03 16:44:17][INFO ] --- graceful stop of workers ([glogging.py](http://glogging.py/))

That makes me think of, that the DATAGERRY webserver should not be the problem in your setup, as I cannot find any logentries in your webserver.error.log file. Could that be an other issue in your machines? Maybe something like Firewalld or anything else. Can you try to access the DATAGERRY backend with curl from remote and the local DATAGERRY machine, the next time that happens?

marcinw · December 7, 2020, 5:25am

Hi @mbatz,
Yes i could,
Which constraints do you want me to check with curlem when the application stops working properly?

mbatz · December 14, 2020, 3:20pm

Hi @marcinw,

just have a look with curl, if the backend is responsible from the local machine and remote:

curl http://127.0.0.1:4000  #from the DATAGERRY machine
curl http://<datagerry>:4000  #from remote

marcinw · January 8, 2021, 6:34am

Hello,
Today after wrote url my instance datagerry blank page. I check on firefox and chrome and was the same results.
i checked debugger on chrome and saw:

in logs was only one file:
logs]# ls
webserver.access.log
When i open log from today i saw:

contents of the file webserver.access.log

[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET / HTTP/1.1” 200 0 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /styles.583cf10385c28b00f5c4.css HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /runtime-es2015.4e665bf262b8eec88c41.js HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /polyfills-es2015.e016ee1c15422fe80744.js HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /main-es2015.c76e94d9136e9ffa2cb7.js HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /scripts.a1c20bd4a35b3ce74323.js HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /assets/img/favicon-32x32.png HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:33][INFO ] — client - - [08/Jan/2021:07:10:33 +0100] “GET /assets/img/favicon-16x16.png HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:48][INFO ] — client - - [08/Jan/2021:07:10:48 +0100] “GET / HTTP/1.1” 304 0 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:48][INFO ] — client - - [08/Jan/2021:07:10:48 +0100] “GET /assets/img/favicon-32x32.png HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:49][INFO ] — client - - [08/Jan/2021:07:10:49 +0100] “GET / HTTP/1.1” 304 0 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)
[2021-01-08 07:10:49][INFO ] — client - - [08/Jan/2021:07:10:49 +0100] “GET /assets/img/favicon-32x32.png HTTP/1.1” 200 0 “http://server:4000/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36” (glogging.py)

curl from localhost show:

Result from command: curl http://127.0.0.1:4000

<!doctype html>

DATAGERRY Please enable JavaScript to continue using this application.

curl from remote host show

<!doctype html>

DATAGERRY Please enable JavaScript to continue using this application.

Command find show only 2 files:
find /tmp/_MEIR63D85/ -type f
/tmp/_MEIR63D85/cmdb/interface/net_app/DATAGERRYApp/index.html
/tmp/_MEIR63D85/logs/webserver.access.log

What can i do next?

mbatz · January 8, 2021, 8:13am

Hi @marcinw,

is it possible, that /tmp on your machine is cleared after some time? When you start the DATAGERRY binary it extracts content (like a Python interpreter and the DATAGERRY code) in a subdirectory of /tmp, which in your case is /tmp/_MEIR63D85. If some of the files were deleted during DATAGERRY is running, a crash of the application is possible.

marcinw · January 8, 2021, 11:40am

Hi @mbatz,
it could be this because i don’t changed anything and there is default configuration for cleaning /tmp
now i change configuration and check what happend next.

I think also if You have right that should develop some mechanism to refresh avery files once time a day or when starting the application, protect against such a situation, whether it is an entry in the configuration file for / tmp or some mechanism against such cleaning if the application unpacks exactly in / tmp or it is an unchanging directory name that would facilitate an entry in the configuration for cleaning the / tmp resource
And the best if it would be possible to set the exact directory in which to unpack and omit / tmp and set the possibility of creating an instance elsewhere

mbatz · January 13, 2021, 8:05am

Hi @marcinw,

I did some research on that. It is systemd-tmpfiles, which cleans files in some directories (like /tmp) based on a configuration. Configuration files for systemd-tmpfiles are placed in /usr/lib/tmpfiles.d and /etc/tmpfiles.d. On my CentOS development box, the default configuration for /tmp is defined to clean all files, that are older than 10d (ctime, atime and mtime). Unfortunately, we can not manipulate the timestamps of the files placed in /tmp, as we use a library (PyInstaller) to create the binary, which does not provide such a functionality.

In future releases, we’ll rollout a configuration file in /usr/lib/tmpfiles.d/datagerry.conf, which prevents systemd-tmpfiles from deleting our files in /tmp:

# systemd tmpfiles exclude file for DATAGERRY

# Exclude PyInstaller temporary files
x /tmp/_MEI*

marcinw · January 13, 2021, 12:27pm

Hello,
Sure, got it.
This will be a great solution, because nobody new to installing datagerry will have a problem with crashing functions in a running application, because they were removed by the systemd-tmpfiles function in the / tmp directory.
So far I have implemented a similar configuration in the main configuration file
so far I made a similar entry in the main tmp.conf file:
X /tmp/_MEI*

mbatz · January 13, 2021, 4:39pm

Thanks a lot for finding this issue!

Topic		Replies	Views
DATAGERRY 1.4.4 released Announcements	0	454	December 8, 2020
DATAGERRY 1.3.2 released Announcements	0	431	September 28, 2020
Links tab not work Report a Bug	4	326	April 4, 2025
DATAGERRY 1.1.3 released Announcements	0	427	April 9, 2020
Issue in documentation Report a Bug	2	125	January 4, 2024

Bug - Incomprehensible application behavior

Related topics