Isolating the Cause of a Server Crash

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Isolating the Cause of a Server Crash

4D Tech mailing list
One of our systems is crashing about every 3 days and I can't seem to
isolate the cause. Lately these are crashes with a Mac crash report
appearing on the screen.
Some system details are:
 - 4D Built Server app with v17.0 HF1 (64 bit Server with 64 Mac and
32 bit Windows Clients)
 - Mac and Windows Clients
 - Mac OS 10.13.5

What I know so far:
 - I have the Server Debug file. It ends with a "." and so the last
command appears to have executed.
 - I'm using the Report Info component, logging every 5 minutes. There
doesn't seem to be memory problems or run away cache issues.
 - I also know who was one each time it crashes and said out an email
to those users to find patterns (so far I've found none).
 - The crashes typically happen around 10am to 11am.
 - The client and server builds match.

I'm debating turning on the client debugger files and then harvesting
them afterwards when the user logs back in. I'm open to other
debugging techniques.

There are other v17 systems running on the same machine with zero issue.

Below is a snippet of the crash report. It seems to be different each
time, but here is the latest. Thread 73 crashed, so I only included
that one.

Thanks,

dave nasralla

------------------------------------
Process:               Corporate [93958]
Path:                  /Users/USER/*/Corporate
Server.app/Contents/MacOS/Corporate
Identifier:            4d.com.Corporate Server.app
Version:               17.0 build 17.226566 (???)
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           Corporate [93958]
User ID:               501

Date/Time:             2018-08-31 11:00:05.952 -0500
OS Version:            Mac OS X 10.13.5 (17F77)
Report Version:        12
Anonymous UUID:        723511FD-4CA0-6E8B-0642-883209248DFC


Time Awake Since Boot: 3700000 seconds

System Integrity Protection: enabled

Crashed Thread:        73  LabProjects List (id = -114)

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       EXC_I386_GPFLT
Exception Note:        EXC_CORPSE_NOTIFY

Termination Signal:    Segmentation fault: 11
Termination Reason:    Namespace SIGNAL, Code 0xb
Terminating Process:   exc handler [0]
----------------------------------------------------------


Thread 73 Crashed:: LabProjects List (id = -114)
0   4d.com.Corporate Server.app       0x000000010694fdbe
V4DConnection::OnPostpone(bool) + 40
1   4d.com.Corporate Server.app       0x0000000106b095f7
V4DServerUser::PostponeServiceConnection() + 35
2   4d.com.Corporate Server.app       0x0000000106b20567
V4DServer::exec_ConnectionPostpone(V4DRequestReply&, V4DTaskConcrete*,
short) + 395
3   4d.com.Corporate Server.app       0x0000000106b211ca
V4DServer::exec_streamreq(V4DRequestReply&, V4DTaskConcrete*) + 100
4   4d.com.Corporate Server.app       0x0000000106b1ddc3
V4DServer::execreq(V4DRequestReply&, V4DTaskConcrete*) + 297
5   4d.com.Corporate Server.app       0x0000000106b1c96f
V4DServer::ReadAndExecuteRequest(V4DTaskConcrete*, void*, void*, int)
+ 349
6   4d.com.Corporate Server.app       0x0000000106b1da6f
V4DServer::_ClientTask(V4DTaskConcrete*, VTaskParams_gereclient*) +
429
7   4d.com.Corporate Server.app       0x0000000106b1cc8f
V4DServer::ClientTask(V4DTaskConcrete*, xbox::IRefCountable*) + 81
8   4d.com.Corporate Server.app       0x0000000106928d93
Task4DProc(V4DTaskConcrete*) + 903
9   4d.com.Corporate Server.app       0x000000010696fe00
V4DTaskManager::_Task4DProc(xbox::VTask*) + 158
10  com.4d.kernel                     0x0000000108a34d4d
xbox::VTask::_Run() + 141
11  com.4d.kernel                     0x0000000108a3a0c6
xbox::XMacTask_fiber::_ThreadProc(void*) + 70
12  com.4d.kernel                     0x0000000108a6f0df
xbox::VMacFiber_thread::_ThreadProc(void*) + 31
13  com.apple.CoreServices.CarbonCore    0x00007fff504c3072
CooperativeThread + 282
14  libsystem_pthread.dylib           0x00007fff77464661 _pthread_body + 340
15  libsystem_pthread.dylib           0x00007fff7746450d _pthread_start + 377
16  libsystem_pthread.dylib           0x00007fff77463bf9 thread_start + 13


--
David Nasralla
Clean Air Engineering
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Isolating the Cause of a Server Crash

4D Tech mailing list
reboot the computer. it has been running for 40 days?

> On Aug 31, 2018, at 1:54 PM, Dave Nasralla via 4D_Tech <[hidden email]> wrote:
>
> One of our systems is crashing about every 3 days
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Isolating the Cause of a Server Crash

4D Tech mailing list
Are you running any virus detective on that machine. If so you should skip 4D folders

Regards

Chuck
------------------------------------------------------------------------------------------------
 Chuck Miller Voice: (617) 739-0306
 Informed Solutions, Inc. Fax: (617) 232-1064      
 mailto:cjmiller<AT SIGN>informed-solutions.com
 Brookline, MA 02446 USA Registered 4D Developer                
       Providers of 4D and Sybase connectivity
          http://www.informed-solutions.com 
------------------------------------------------------------------------------------------------
This message and any attached documents contain information which may be confidential, subject to privilege or exempt from disclosure under applicable law.  These materials are intended only for the use of the intended recipient. If you are not the intended recipient of this transmission, you are hereby notified that any distribution, disclosure, printing, copying, storage, modification or the taking of any action in reliance upon this transmission is strictly prohibited.  Delivery of this message to any person other than the intended recipient shall not compromise or waive such confidentiality, privilege or exemption from disclosure as to this communication.

> On Aug 31, 2018, at 5:17 PM, Spencer Hinsdale via 4D_Tech <[hidden email]> wrote:
>
> reboot the computer. it has been running for 40 days?

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

RE: Isolating the Cause of a Server Crash

4D Tech mailing list
I strongly recommend what Chuck is saying.  We tell our customers to exempt our folders from any scanning, virus, auto-bots, etc...

We have seen database damage caused by this, which in turn results in crashing.

Steve


-----Original Message-----
From: 4D_Tech [mailto:[hidden email]] On Behalf Of Chuck Miller via 4D_Tech
Sent: Friday, August 31, 2018 4:32 PM
To: 4DTechList Tech <[hidden email]>
Cc: Chuck Miller <[hidden email]>
Subject: Re: Isolating the Cause of a Server Crash

Are you running any virus detective on that machine. If so you should skip 4D folders

Regards

Chuck
------------------------------------------------------------------------------------------------
 Chuck Miller Voice: (617) 739-0306
 Informed Solutions, Inc. Fax: (617) 232-1064      
 mailto:cjmiller<AT SIGN>informed-solutions.com
 Brookline, MA 02446 USA Registered 4D Developer                
       Providers of 4D and Sybase connectivity
          http://www.informed-solutions.com 
------------------------------------------------------------------------------------------------
This message and any attached documents contain information which may be confidential, subject to privilege or exempt from disclosure under applicable law.  These materials are intended only for the use of the intended recipient. If you are not the intended recipient of this transmission, you are hereby notified that any distribution, disclosure, printing, copying, storage, modification or the taking of any action in reliance upon this transmission is strictly prohibited.  Delivery of this message to any person other than the intended recipient shall not compromise or waive such confidentiality, privilege or exemption from disclosure as to this communication.

> On Aug 31, 2018, at 5:17 PM, Spencer Hinsdale via 4D_Tech <[hidden email]> wrote:
>
> reboot the computer. it has been running for 40 days?

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Isolating the Cause of a Server Crash

4D Tech mailing list
Thanks to all that have responded.
 - I rebooted the machine this evening. (In the past it has run as
long as a year without a reboot - which was only done for a system
update.)
 - No virus scans running on it
 - Backblaze runs, but the .4DD files are skipped.
 - MCS Scans came  back clean
 - Indexes have been rebuilt

One thing I have noticed is that, although the client machines are
running along fine and users can log in or out and do their tasks, the
4D Administration  Interface on the built application gets wonky. For
example, after running for a day, the "Monitor" tab will no longer
show a graph and the Details area (with the pie charts) is blank with
a message something like (only visible to database administrators). Or
I'll go to the Users tab and nothing shows up, yet users are
connected.

Other 4D applications are fine.

dave

On Fri, Aug 31, 2018 at 3:45 PM Stephen J. Orth via 4D_Tech
<[hidden email]> wrote:

>
> I strongly recommend what Chuck is saying.  We tell our customers to exempt our folders from any scanning, virus, auto-bots, etc...
>
> We have seen database damage caused by this, which in turn results in crashing.
>
> Steve
>
>
> -----Original Message-----
> From: 4D_Tech [mailto:[hidden email]] On Behalf Of Chuck Miller via 4D_Tech
> Sent: Friday, August 31, 2018 4:32 PM
> To: 4DTechList Tech <[hidden email]>
> Cc: Chuck Miller <[hidden email]>
> Subject: Re: Isolating the Cause of a Server Crash
>
> Are you running any virus detective on that machine. If so you should skip 4D folders
>
> Regards
>
> Chuck
> ------------------------------------------------------------------------------------------------
>  Chuck Miller Voice: (617) 739-0306
>  Informed Solutions, Inc. Fax: (617) 232-1064
>  mailto:cjmiller<AT SIGN>informed-solutions.com
>  Brookline, MA 02446 USA Registered 4D Developer
>        Providers of 4D and Sybase connectivity
>           http://www.informed-solutions.com
> ------------------------------------------------------------------------------------------------
> This message and any attached documents contain information which may be confidential, subject to privilege or exempt from disclosure under applicable law.  These materials are intended only for the use of the intended recipient. If you are not the intended recipient of this transmission, you are hereby notified that any distribution, disclosure, printing, copying, storage, modification or the taking of any action in reliance upon this transmission is strictly prohibited.  Delivery of this message to any person other than the intended recipient shall not compromise or waive such confidentiality, privilege or exemption from disclosure as to this communication.
>
> > On Aug 31, 2018, at 5:17 PM, Spencer Hinsdale via 4D_Tech <[hidden email]> wrote:
> >
> > reboot the computer. it has been running for 40 days?
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************



--
David Nasralla
Clean Air Engineering
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Isolating the Cause of a Server Crash

4D Tech mailing list
In reply to this post by 4D Tech mailing list
On Sep 1, 2018, at 2:00 PM, Dave Nasralla wrote:

> One of our systems is crashing about every 3 days and I can't seem to
> isolate the cause. Lately these are crashes with a Mac crash report
> appearing on the screen.
> Some system details are:
> - 4D Built Server app with v17.0 HF1 (64 bit Server with 64 Mac and
> 32 bit Windows Clients)
> - Mac and Windows Clients
> - Mac OS 10.13.5
>
> What I know so far:
> - I have the Server Debug file. It ends with a "." and so the last
> command appears to have executed.
> - I'm using the Report Info component, logging every 5 minutes. There
> doesn't seem to be memory problems or run away cache issues.
> - I also know who was one each time it crashes and said out an email
> to those users to find patterns (so far I've found none).
> - The crashes typically happen around 10am to 11am.
> - The client and server builds match.
>
> I'm debating turning on the client debugger files and then harvesting
> them afterwards when the user logs back in. I'm open to other
> debugging techniques.
>
> There are other v17 systems running on the same machine with zero issue.
>
> Below is a snippet of the crash report. It seems to be different each
> time, but here is the latest. Thread 73 crashed, so I only included
> that one.
>
> Thanks,
>
> dave nasralla
> ------------------------------------
> Process:               Corporate [93958]
> Path:                  /Users/USER/*/Corporate
> Server.app/Contents/MacOS/Corporate
> Identifier:            4d.com.Corporate Server.app
> Version:               17.0 build 17.226566 (???)
> Code Type:             X86-64 (Native)
> Parent Process:        ??? [1]
> Responsible:           Corporate [93958]
> User ID:               501
>
> Date/Time:             2018-08-31 11:00:05.952 -0500
> OS Version:            Mac OS X 10.13.5 (17F77)
> Report Version:        12
> Anonymous UUID:        723511FD-4CA0-6E8B-0642-883209248DFC
>
>
> Time Awake Since Boot: 3700000 seconds
>
> System Integrity Protection: enabled
>
> Crashed Thread:        73  LabProjects List (id = -114)
>
> Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
> Exception Codes:       EXC_I386_GPFLT
> Exception Note:        EXC_CORPSE_NOTIFY
>
> Termination Signal:    Segmentation fault: 11
> Termination Reason:    Namespace SIGNAL, Code 0xb
> Terminating Process:   exc handler [0]
> ----------------------------------------------------------
>
>
> Thread 73 Crashed:: LabProjects List (id = -114)
> 0   4d.com.Corporate Server.app       0x000000010694fdbe
> V4DConnection::OnPostpone(bool) + 40
> 1   4d.com.Corporate Server.app       0x0000000106b095f7
> V4DServerUser::PostponeServiceConnection() + 35
> 2   4d.com.Corporate Server.app       0x0000000106b20567
> V4DServer::exec_ConnectionPostpone(V4DRequestReply&, V4DTaskConcrete*,
> short) + 395
> 3   4d.com.Corporate Server.app       0x0000000106b211ca
> V4DServer::exec_streamreq(V4DRequestReply&, V4DTaskConcrete*) + 100

Hi Dave,

Crashing every 3 days is a real problem and totally unacceptable. So what can be done to try and make this situation better? We need to make changes to make this crashing stop. But what changes?

Here is my thinking as I read this crash report. Keep in mind I’m not an expert on this, so I may be wrong in some areas. If I am wrong hopefully those that know more can correct me — and in turn help me and others understand more about how to read these macOS crash reports. (Thinking about Miyako, JPR, Christian Sakowski and Rob Laveaux — they are real experts in this area. Real macOS programmers that know how to read these things properly.)

The crash report is supposed to provide a programmer with information on exactly here the program crashed and the cause of the crash. If you have the special 4D “debug” version it will contain more “symbols” and thus when 4D crashes you get better names for functions instead of just memory address offset. I think you even get 4D command names that were involved in the crash. But the basic crash dump info that we have here can help point to the general area of concern. Here is a website that helps explain crash dumps and how to read them:

https://www.maketecheasier.com/read-macos-crash-reports-troubleshoot-mac/

This is 4D v17.0 build 226566 that is running compiled in 64bit mode (Code Type: x86-64). So first thought is that this could be a 4D 64bit issue. That’s important because some of the code is completely different between 32bit 4D and 64bit 4D. The 64bit code could be newly written code, the 32bit code could be legacy code that has been around for years.

Thread 73 “LabProjects List” is what crashed. Do you have a table named “LabProjects” or maybe a MODIFY SELECTION or a listbox window that shows records in this table? Or a process that has that name? Makes me think that you do. That’s another pointer to where in your application the crashing problem occurred.

Exception Type is "EXC_BAD_ACCESS (SIGSEGV)” and that means "the program attempts to access memory incorrectly or with an invalid address”. Could be a C pointer that went bad or something doing with virtual memory or even how 4D allocates its own memory internally. Could be 4D data cache related. Basically 4D tried to access memory is was not allowed to access and macOS killed 4D so that it could not damage other parts of the system and cause them to crash. Thank you macOS for watching out and protecting us from complete system corruption and crashing. Windows does this too.

The last area is where we can see exactly where in 4D — and even the 4D C or Objective C function name — that was running when macOS said “enough, this application has gone crazy, I need to kill it before it does damage to other applications.” The functions are listed in reverse chronological order, so the one at the bottom is where the “call chain” started. The one at the top is where it died.

The function name is "V4DConnection::OnPostpone(bool)” and at the code at 40 bytes from the start of that function is where the offending memory address statement occurred. The name “V4DConnection” makes me think this is related to networking, 4D Server handling network actions with 4D Client. The “OnPostpone” makes me think this is somehow related to sleeping or a 4D Client connection that has been asleep and needs to now wake up. And lastly it make me think “this is related to the new network layer code”. Again, this is just my thinking. I could be completely wrong about all of this.

So now my brain tries to build a scenario that could most likely happen that could be connected to this situation. Happens during the day between 10am and 11am. It’s a work day with users connected. People came in to work got connected to 4D Server, then wandered off to a meeting or something and their computer went to sleep. You are using 4D Server compiled 64bit so you MUST be using the new network layer. Legacy is only available in 32bit compiled 4D Server macOS.

There is this new network layer feature where if a 4D Client machine goes into sleep mode you don’t lose your 4D Server connection. So that when the user wakes up the 4D Client machine it notifies 4D Server and the old network connection is reenergized and brought back to life. That “OnPostpone” mention above makes me think this also. Maybe something went wrong in that area of 4D. It is a tricky area because sleep could last for hours or days and memory could be moved around and pointer can easily go bad in those type of situations.

So there is my analysis. Now what changes could you make to stop these damn crashing situations? Here are some idea:

- You say it happens about every 3 days, so just restart 4D Server every single day. Giant PITA I know. But just an idea for what to do now to eliminate the crashing.

- Stop all 4D Client machines from sleeping. You’d have to physically go to every machine and turn off system sleeping and allow the display to go to sleep. You can’t rely on users to do this, and do it right. This is what I would do, if I had physical access to all the machine — or at least RDP access — so that I could make sure every machine had system sleep turned off. (Of course you already have App Napping turned off on the 4D Server machine so that’s not part of this issue, right?)

- Crash dump lists Build Number 226566. v17.0 has build 225365. v17.0 HF1 has build 226237. A quick check of 4D forums “Nightly Builds 4D v17” shows this build is from 8/22/18. So you are running a nightly build. I’m guessing you used v17.0 and had problems, went to v17.0 HF1 and still had problems, so you went to nightly builds to try and find a fix. Maybe you keep doing that. Current nightly build is 226837. You may find they’ve fixed the bug that is biting you.

- Stop using the new network layer. You would have to stop using 64bit 4D Server so the many not be a viable option. You are limited to a 2GB data cache. But maybe if you can stop the crashing now it worth that limitation. That means compiling a 32bit version of 4D Server and 4D Client, and replacing all the 64bit 4D Client applications with the 32bit version. I think you could use the auto client update feature to automate this.

That’s all I can contribute. If you find a solution to your crashing every 3 days problem please be sure to post here so we know what fixed it for you.

Tim

*****************************************
Tim Nevels
Innovative Solutions
785-749-3444
[hidden email]
*****************************************


**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Isolating the Cause of a Server Crash

4D Tech mailing list
In reply to this post by 4D Tech mailing list
[JPR]

Hi Dave, Tim,

This kind of crash is always difficult to track down, for it is not easily reproductible. From what I see (and as Tim pointed) it seems there is a memory problem that is revelated in the process LabProjects List. But a memory problem can occur a while before the actual crash, because the application may have a corrupted memory and not be aware of it until the crash.

- Is your application compiled? If yes, be sure that the Range checking option is set.
- Is the LabProjects ListProcess a client process on server, or a worker or process running on the server?
- The time of crash seems irrelevant, but may be it's linked to a peak in activity and a server or network stress?
- A client problem causing a server crash is unlikely, but it may help to know if there is a correlation between the crash and a particular client doing a particular operation.
- Do you know which method is executed when it crashes?
- Do you use interprocess variables like arrays for instance?
- How much memory has been given to the server and to the cache?

This is just a short list of points to check, but it may help to reduce the problem to a small part of the application.

My very best,

JPR


> On 2 Sep 2018, at 21:00, [hidden email] wrote:
>
> From: Tim Nevels <[hidden email]>
> To: [hidden email]
> Subject: Re: Isolating the Cause of a Server Crash
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=utf-8
>
> On Sep 1, 2018, at 2:00 PM, Dave Nasralla wrote:
>
>> One of our systems is crashing about every 3 days and I can't seem to
>> isolate the cause. Lately these are crashes with a Mac crash report
>> appearing on the screen.
>> Some system details are:
>> - 4D Built Server app with v17.0 HF1 (64 bit Server with 64 Mac and
>> 32 bit Windows Clients)
>> - Mac and Windows Clients
>> - Mac OS 10.13.5
>>
>> What I know so far:
>> - I have the Server Debug file. It ends with a "." and so the last
>> command appears to have executed.
>> - I'm using the Report Info component, logging every 5 minutes. There
>> doesn't seem to be memory problems or run away cache issues.
>> - I also know who was one each time it crashes and said out an email
>> to those users to find patterns (so far I've found none).
>> - The crashes typically happen around 10am to 11am.
>> - The client and server builds match.
>>
>> I'm debating turning on the client debugger files and then harvesting
>> them afterwards when the user logs back in. I'm open to other
>> debugging techniques.
>>
>> There are other v17 systems running on the same machine with zero issue.
>>
>> Below is a snippet of the crash report. It seems to be different each
>> time, but here is the latest. Thread 73 crashed, so I only included
>> that one.
>>
>> Thanks,
>>
>> dave nasralla
>> ------------------------------------
>> Process:               Corporate [93958]
>> Path:                  /Users/USER/*/Corporate
>> Server.app/Contents/MacOS/Corporate
>> Identifier:            4d.com.Corporate Server.app
>> Version:               17.0 build 17.226566 (???)
>> Code Type:             X86-64 (Native)
>> Parent Process:        ??? [1]
>> Responsible:           Corporate [93958]
>> User ID:               501
>>
>> Date/Time:             2018-08-31 11:00:05.952 -0500
>> OS Version:            Mac OS X 10.13.5 (17F77)
>> Report Version:        12
>> Anonymous UUID:        723511FD-4CA0-6E8B-0642-883209248DFC
>>
>>
>> Time Awake Since Boot: 3700000 seconds
>>
>> System Integrity Protection: enabled
>>
>> Crashed Thread:        73  LabProjects List (id = -114)
>>
>> Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
>> Exception Codes:       EXC_I386_GPFLT
>> Exception Note:        EXC_CORPSE_NOTIFY
>>
>> Termination Signal:    Segmentation fault: 11
>> Termination Reason:    Namespace SIGNAL, Code 0xb
>> Terminating Process:   exc handler [0]
>> ----------------------------------------------------------
>>
>>
>> Thread 73 Crashed:: LabProjects List (id = -114)
>> 0   4d.com.Corporate Server.app       0x000000010694fdbe
>> V4DConnection::OnPostpone(bool) + 40
>> 1   4d.com.Corporate Server.app       0x0000000106b095f7
>> V4DServerUser::PostponeServiceConnection() + 35
>> 2   4d.com.Corporate Server.app       0x0000000106b20567
>> V4DServer::exec_ConnectionPostpone(V4DRequestReply&, V4DTaskConcrete*,
>> short) + 395
>> 3   4d.com.Corporate Server.app       0x0000000106b211ca
>> V4DServer::exec_streamreq(V4DRequestReply&, V4DTaskConcrete*) + 100
>
> Hi Dave,
>
> Crashing every 3 days is a real problem and totally unacceptable. So what can be done to try and make this situation better? We need to make changes to make this crashing stop. But what changes?
>
> Here is my thinking as I read this crash report. Keep in mind I’m not an expert on this, so I may be wrong in some areas. If I am wrong hopefully those that know more can correct me — and in turn help me and others understand more about how to read these macOS crash reports. (Thinking about Miyako, JPR, Christian Sakowski and Rob Laveaux — they are real experts in this area. Real macOS programmers that know how to read these things properly.)
>
> The crash report is supposed to provide a programmer with information on exactly here the program crashed and the cause of the crash. If you have the special 4D “debug” version it will contain more “symbols” and thus when 4D crashes you get better names for functions instead of just memory address offset. I think you even get 4D command names that were involved in the crash. But the basic crash dump info that we have here can help point to the general area of concern. Here is a website that helps explain crash dumps and how to read them:
>
> https://www.maketecheasier.com/read-macos-crash-reports-troubleshoot-mac/
>
> This is 4D v17.0 build 226566 that is running compiled in 64bit mode (Code Type: x86-64). So first thought is that this could be a 4D 64bit issue. That’s important because some of the code is completely different between 32bit 4D and 64bit 4D. The 64bit code could be newly written code, the 32bit code could be legacy code that has been around for years.
>
> Thread 73 “LabProjects List” is what crashed. Do you have a table named “LabProjects” or maybe a MODIFY SELECTION or a listbox window that shows records in this table? Or a process that has that name? Makes me think that you do. That’s another pointer to where in your application the crashing problem occurred.
>
> Exception Type is "EXC_BAD_ACCESS (SIGSEGV)” and that means "the program attempts to access memory incorrectly or with an invalid address”. Could be a C pointer that went bad or something doing with virtual memory or even how 4D allocates its own memory internally. Could be 4D data cache related. Basically 4D tried to access memory is was not allowed to access and macOS killed 4D so that it could not damage other parts of the system and cause them to crash. Thank you macOS for watching out and protecting us from complete system corruption and crashing. Windows does this too.
>
> The last area is where we can see exactly where in 4D — and even the 4D C or Objective C function name — that was running when macOS said “enough, this application has gone crazy, I need to kill it before it does damage to other applications.” The functions are listed in reverse chronological order, so the one at the bottom is where the “call chain” started. The one at the top is where it died.
>
> The function name is "V4DConnection::OnPostpone(bool)” and at the code at 40 bytes from the start of that function is where the offending memory address statement occurred. The name “V4DConnection” makes me think this is related to networking, 4D Server handling network actions with 4D Client. The “OnPostpone” makes me think this is somehow related to sleeping or a 4D Client connection that has been asleep and needs to now wake up. And lastly it make me think “this is related to the new network layer code”. Again, this is just my thinking. I could be completely wrong about all of this.
>
> So now my brain tries to build a scenario that could most likely happen that could be connected to this situation. Happens during the day between 10am and 11am. It’s a work day with users connected. People came in to work got connected to 4D Server, then wandered off to a meeting or something and their computer went to sleep. You are using 4D Server compiled 64bit so you MUST be using the new network layer. Legacy is only available in 32bit compiled 4D Server macOS.
>
> There is this new network layer feature where if a 4D Client machine goes into sleep mode you don’t lose your 4D Server connection. So that when the user wakes up the 4D Client machine it notifies 4D Server and the old network connection is reenergized and brought back to life. That “OnPostpone” mention above makes me think this also. Maybe something went wrong in that area of 4D. It is a tricky area because sleep could last for hours or days and memory could be moved around and pointer can easily go bad in those type of situations.
>
> So there is my analysis. Now what changes could you make to stop these damn crashing situations? Here are some idea:
>
> - You say it happens about every 3 days, so just restart 4D Server every single day. Giant PITA I know. But just an idea for what to do now to eliminate the crashing.
>
> - Stop all 4D Client machines from sleeping. You’d have to physically go to every machine and turn off system sleeping and allow the display to go to sleep. You can’t rely on users to do this, and do it right. This is what I would do, if I had physical access to all the machine — or at least RDP access — so that I could make sure every machine had system sleep turned off. (Of course you already have App Napping turned off on the 4D Server machine so that’s not part of this issue, right?)
>
> - Crash dump lists Build Number 226566. v17.0 has build 225365. v17.0 HF1 has build 226237. A quick check of 4D forums “Nightly Builds 4D v17” shows this build is from 8/22/18. So you are running a nightly build. I’m guessing you used v17.0 and had problems, went to v17.0 HF1 and still had problems, so you went to nightly builds to try and find a fix. Maybe you keep doing that. Current nightly build is 226837. You may find they’ve fixed the bug that is biting you.
>
> - Stop using the new network layer. You would have to stop using 64bit 4D Server so the many not be a viable option. You are limited to a 2GB data cache. But maybe if you can stop the crashing now it worth that limitation. That means compiling a 32bit version of 4D Server and 4D Client, and replacing all the 64bit 4D Client applications with the 32bit version. I think you could use the auto client update feature to automate this.

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Isolating the Cause of a Server Crash

4D Tech mailing list
Hi JPR,

Thanks for your comments.

- On each crash report, it is a different thread. Twice it was similar
to what I'll post at the end of this message (ServerNet select I/O
handler).
Once it was the LabProjects List, but there is nothing unique about
that list of records.
 - Range checking is on (the application always runs compiled)
 - It could be related to network stress - typically is does happen at
busier hours (never after hours)
 - I do generate debug files. It's not a specific method that is
running. It varies. The last command in the debug file always has a
"." after it. My understanding is that means the command executed
complete.
 - I do use interprocess variables to cache employee data for fast
access to names, email addresses, etc. It is relatively small with 7
parallel arrays containing less that 150 elements each. Also some
system settings - also under 100 elements.
 - The cache is set to 1GB. The datafile is 3GB in size.
- I use the 4D Info Reporter. Tim has walked me through looks at the
results. At first it looked like the Server was running low after a
backup, but I wrote in a purge command that clears it up. At the time
of each crash there is nothing remarkable in the report.

I think you are correct, that it probably is not a client issue -
though I do use routines that have the "execute on server" box
checked. Either way, I uploaded a modification last night that turns
on client debugging and creates a session record in a table at the
start of a client session. If the client record is not closed out via
the "On Exit" method, when the user logs in again the system will
upload their debug files (max of two are created). On the next crash
I'll take a closer look to see what clients were doing.

One thing that bothers me, is on occasion the Administration interface
begins to no longer display information. For example, when I went to
quit the application last night for and update, the window appeared
asking how to quit. I told the system to shutdown in 1 minute. The
next dialog contained only a server icon, and the countdown clock
stuck at "00 00". No text or message as displayed. The server did
shutdown as requested in 1 minute.

Thanks for your questions. Another sample crash report is below.

dave


Thread 29 Crashed:: ServerNet select I/O handler (id = 90423)
0   com.4d.ServerNet                  0x0000000110d5837e
xbox::VTCPSelectWatchAction::HandleError(fd_set*) + 38
1   com.4d.ServerNet                  0x0000000110d589ea
xbox::VTCPSelectIOHandler::DoRun() + 712
2   com.4d.ServerNet                  0x0000000110d58afd non-virtual
thunk to xbox::VTCPSelectIOHandler::DoRun() + 13
3   com.4d.kernel                     0x0000000110bbadaa
xbox::VTask::_Run() + 234
4   com.4d.kernel                     0x0000000110bbfb01
xbox::XMacTask_preemptive::_ThreadProc(void*) + 145
5   libsystem_pthread.dylib           0x00007fff6e307661 _pthread_body + 340
6   libsystem_pthread.dylib           0x00007fff6e30750d _pthread_start + 377
7   libsystem_pthread.dylib           0x00007fff6e306bf9 thread_start + 13

On Tue, Sep 4, 2018 at 10:40 AM JPR via 4D_Tech <[hidden email]> wrote:

>
> [JPR]
>
> Hi Dave, Tim,
>
> This kind of crash is always difficult to track down, for it is not easily reproductible. From what I see (and as Tim pointed) it seems there is a memory problem that is revelated in the process LabProjects List. But a memory problem can occur a while before the actual crash, because the application may have a corrupted memory and not be aware of it until the crash.
>
> - Is your application compiled? If yes, be sure that the Range checking option is set.
> - Is the LabProjects ListProcess a client process on server, or a worker or process running on the server?
> - The time of crash seems irrelevant, but may be it's linked to a peak in activity and a server or network stress?
> - A client problem causing a server crash is unlikely, but it may help to know if there is a correlation between the crash and a particular client doing a particular operation.
> - Do you know which method is executed when it crashes?
> - Do you use interprocess variables like arrays for instance?
> - How much memory has been given to the server and to the cache?
>
> This is just a short list of points to check, but it may help to reduce the problem to a small part of the application.
>
> My very best,
>
> JPR
>
>
> > On 2 Sep 2018, at 21:00, [hidden email] wrote:
> >
> > From: Tim Nevels <[hidden email]>
> > To: [hidden email]
> > Subject: Re: Isolating the Cause of a Server Crash
> > Message-ID: <[hidden email]>
> > Content-Type: text/plain; charset=utf-8
> >
> > On Sep 1, 2018, at 2:00 PM, Dave Nasralla wrote:
> >
> >> One of our systems is crashing about every 3 days and I can't seem to
> >> isolate the cause. Lately these are crashes with a Mac crash report
> >> appearing on the screen.
> >> Some system details are:
> >> - 4D Built Server app with v17.0 HF1 (64 bit Server with 64 Mac and
> >> 32 bit Windows Clients)
> >> - Mac and Windows Clients
> >> - Mac OS 10.13.5
> >>
> >> What I know so far:
> >> - I have the Server Debug file. It ends with a "." and so the last
> >> command appears to have executed.
> >> - I'm using the Report Info component, logging every 5 minutes. There
> >> doesn't seem to be memory problems or run away cache issues.
> >> - I also know who was one each time it crashes and said out an email
> >> to those users to find patterns (so far I've found none).
> >> - The crashes typically happen around 10am to 11am.
> >> - The client and server builds match.
> >>
> >> I'm debating turning on the client debugger files and then harvesting
> >> them afterwards when the user logs back in. I'm open to other
> >> debugging techniques.
> >>
> >> There are other v17 systems running on the same machine with zero issue.
> >>
> >> Below is a snippet of the crash report. It seems to be different each
> >> time, but here is the latest. Thread 73 crashed, so I only included
> >> that one.
> >>
> >> Thanks,
> >>
> >> dave nasralla
> >> ------------------------------------
> >> Process:               Corporate [93958]
> >> Path:                  /Users/USER/*/Corporate
> >> Server.app/Contents/MacOS/Corporate
> >> Identifier:            4d.com.Corporate Server.app
> >> Version:               17.0 build 17.226566 (???)
> >> Code Type:             X86-64 (Native)
> >> Parent Process:        ??? [1]
> >> Responsible:           Corporate [93958]
> >> User ID:               501
> >>
> >> Date/Time:             2018-08-31 11:00:05.952 -0500
> >> OS Version:            Mac OS X 10.13.5 (17F77)
> >> Report Version:        12
> >> Anonymous UUID:        723511FD-4CA0-6E8B-0642-883209248DFC
> >>
> >>
> >> Time Awake Since Boot: 3700000 seconds
> >>
> >> System Integrity Protection: enabled
> >>
> >> Crashed Thread:        73  LabProjects List (id = -114)
> >>
> >> Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
> >> Exception Codes:       EXC_I386_GPFLT
> >> Exception Note:        EXC_CORPSE_NOTIFY
> >>
> >> Termination Signal:    Segmentation fault: 11
> >> Termination Reason:    Namespace SIGNAL, Code 0xb
> >> Terminating Process:   exc handler [0]
> >> ----------------------------------------------------------
> >>
> >>
> >> Thread 73 Crashed:: LabProjects List (id = -114)
> >> 0   4d.com.Corporate Server.app       0x000000010694fdbe
> >> V4DConnection::OnPostpone(bool) + 40
> >> 1   4d.com.Corporate Server.app       0x0000000106b095f7
> >> V4DServerUser::PostponeServiceConnection() + 35
> >> 2   4d.com.Corporate Server.app       0x0000000106b20567
> >> V4DServer::exec_ConnectionPostpone(V4DRequestReply&, V4DTaskConcrete*,
> >> short) + 395
> >> 3   4d.com.Corporate Server.app       0x0000000106b211ca
> >> V4DServer::exec_streamreq(V4DRequestReply&, V4DTaskConcrete*) + 100
> >
> > Hi Dave,
> >
> > Crashing every 3 days is a real problem and totally unacceptable. So what can be done to try and make this situation better? We need to make changes to make this crashing stop. But what changes?
> >
> > Here is my thinking as I read this crash report. Keep in mind I’m not an expert on this, so I may be wrong in some areas. If I am wrong hopefully those that know more can correct me — and in turn help me and others understand more about how to read these macOS crash reports. (Thinking about Miyako, JPR, Christian Sakowski and Rob Laveaux — they are real experts in this area. Real macOS programmers that know how to read these things properly.)
> >
> > The crash report is supposed to provide a programmer with information on exactly here the program crashed and the cause of the crash. If you have the special 4D “debug” version it will contain more “symbols” and thus when 4D crashes you get better names for functions instead of just memory address offset. I think you even get 4D command names that were involved in the crash. But the basic crash dump info that we have here can help point to the general area of concern. Here is a website that helps explain crash dumps and how to read them:
> >
> > https://www.maketecheasier.com/read-macos-crash-reports-troubleshoot-mac/
> >
> > This is 4D v17.0 build 226566 that is running compiled in 64bit mode (Code Type: x86-64). So first thought is that this could be a 4D 64bit issue. That’s important because some of the code is completely different between 32bit 4D and 64bit 4D. The 64bit code could be newly written code, the 32bit code could be legacy code that has been around for years.
> >
> > Thread 73 “LabProjects List” is what crashed. Do you have a table named “LabProjects” or maybe a MODIFY SELECTION or a listbox window that shows records in this table? Or a process that has that name? Makes me think that you do. That’s another pointer to where in your application the crashing problem occurred.
> >
> > Exception Type is "EXC_BAD_ACCESS (SIGSEGV)” and that means "the program attempts to access memory incorrectly or with an invalid address”. Could be a C pointer that went bad or something doing with virtual memory or even how 4D allocates its own memory internally. Could be 4D data cache related. Basically 4D tried to access memory is was not allowed to access and macOS killed 4D so that it could not damage other parts of the system and cause them to crash. Thank you macOS for watching out and protecting us from complete system corruption and crashing. Windows does this too.
> >
> > The last area is where we can see exactly where in 4D — and even the 4D C or Objective C function name — that was running when macOS said “enough, this application has gone crazy, I need to kill it before it does damage to other applications.” The functions are listed in reverse chronological order, so the one at the bottom is where the “call chain” started. The one at the top is where it died.
> >
> > The function name is "V4DConnection::OnPostpone(bool)” and at the code at 40 bytes from the start of that function is where the offending memory address statement occurred. The name “V4DConnection” makes me think this is related to networking, 4D Server handling network actions with 4D Client. The “OnPostpone” makes me think this is somehow related to sleeping or a 4D Client connection that has been asleep and needs to now wake up. And lastly it make me think “this is related to the new network layer code”. Again, this is just my thinking. I could be completely wrong about all of this.
> >
> > So now my brain tries to build a scenario that could most likely happen that could be connected to this situation. Happens during the day between 10am and 11am. It’s a work day with users connected. People came in to work got connected to 4D Server, then wandered off to a meeting or something and their computer went to sleep. You are using 4D Server compiled 64bit so you MUST be using the new network layer. Legacy is only available in 32bit compiled 4D Server macOS.
> >
> > There is this new network layer feature where if a 4D Client machine goes into sleep mode you don’t lose your 4D Server connection. So that when the user wakes up the 4D Client machine it notifies 4D Server and the old network connection is reenergized and brought back to life. That “OnPostpone” mention above makes me think this also. Maybe something went wrong in that area of 4D. It is a tricky area because sleep could last for hours or days and memory could be moved around and pointer can easily go bad in those type of situations.
> >
> > So there is my analysis. Now what changes could you make to stop these damn crashing situations? Here are some idea:
> >
> > - You say it happens about every 3 days, so just restart 4D Server every single day. Giant PITA I know. But just an idea for what to do now to eliminate the crashing.
> >
> > - Stop all 4D Client machines from sleeping. You’d have to physically go to every machine and turn off system sleeping and allow the display to go to sleep. You can’t rely on users to do this, and do it right. This is what I would do, if I had physical access to all the machine — or at least RDP access — so that I could make sure every machine had system sleep turned off. (Of course you already have App Napping turned off on the 4D Server machine so that’s not part of this issue, right?)
> >
> > - Crash dump lists Build Number 226566. v17.0 has build 225365. v17.0 HF1 has build 226237. A quick check of 4D forums “Nightly Builds 4D v17” shows this build is from 8/22/18. So you are running a nightly build. I’m guessing you used v17.0 and had problems, went to v17.0 HF1 and still had problems, so you went to nightly builds to try and find a fix. Maybe you keep doing that. Current nightly build is 226837. You may find they’ve fixed the bug that is biting you.
> >
> > - Stop using the new network layer. You would have to stop using 64bit 4D Server so the many not be a viable option. You are limited to a 2GB data cache. But maybe if you can stop the crashing now it worth that limitation. That means compiling a 32bit version of 4D Server and 4D Client, and replacing all the 64bit 4D Client applications with the 32bit version. I think you could use the auto client update feature to automate this.
>
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> Archive:  http://lists.4d.com/archives.html
> Options: https://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:[hidden email]
> **********************************************************************



--
David Nasralla
Clean Air Engineering
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[hidden email]
**********************************************************************