Stacktrace review: concern, probable solution and review news
Hi,
The biggest concern with the stacktrace library is it's usability in
async-signal-handlers. Here are some facts:
* users definitely want to use the library in async-signal-handlers to
print stacktraces on SIGSEGV/SIGBUS/SIGABRT
* the only cross-platform async-signal-safe thing that can be done -
is to print function addresses (without names and source line
locations)
* reviewers wish to have a smaller size of the `stacktrace` class,
want to avoid template parameter and have a std::vector<frames> like
interface (this is async unsafe because vector does dynamic
allocations).
So, the solution:
* Change `basic_stacktrace
Antony Polukhin wrote:
* Take a look at the llvm-symbolizer, atos and libdwraf; try hard to produce a non-forking solution
This could help: https://github.com/facebookarchive/atosl
* ? Deserialization/inspection of stacktrace on other machine
For that (actually, for the serialization/saving part) you need to give the user a list of modules and their load addresses. EnumProcessModules + GetModuleInformation + GetModuleFileName on Windows, so that the addresses are saved out as offsets within a module, not as absolute addresses.
Things NOT to do ever: * Callbacks to do things on stacktraces - no way to satisfy all the needs and make them optimal on all the platforms. Do some caching instead
I don't think that you can have an async-safe interface that does not replicate work needlessly per frame without using callbacks. Well, you could, if the user passes you large enough arrays, I suppose. char function[MAX_FRAMES][MAX_LENGTH], and same for file. A callback-based API needs much less memory. Or you could not provide an API for getting the functions/files/lines at all in one go and only offer output in text form.
On 12/21/16 22:43, Antony Polukhin wrote:
Hi,
* reviewers wish to have a smaller size of the `stacktrace` class, want to avoid template parameter and have a std::vector<frames> like interface (this is async unsafe because vector does dynamic allocations).
Actually, that depends on the allocator. A stack-based allocator or an allocator backed by static storage could be async-safe, making `vector` async-safe as well.
So, the solution: * Change `basic_stacktrace
` to `basic_stacktrace >` and make it's constructor async-unsafe and add additional runtime parameter for stack depth.
Why not:
template<
class Backend = default_backend,
class Allocator = std::allocator
class basic_stacktrace; ? (As noted above, the constructor can still be signal-safe with an appropriate allocator.) I suggested in my review that backend as a template parameter could be beneficial and makes more sense. Note also that the allocator is instantiated on `frame`, which is what is supposed to be the element type of the stacktrace container. If `frame` is a standard layout class with only `void*` as its data member, you could still use it with underlying APIs that take `void**` as the list of stack frames. It would require a `reinterpret_cast` inside the library to cast between `frame*` and `void**`, but in return the user's interface would be clean and operate on `frame`s instead of void pointers.
* Provide async-safe functions that dump addresses separated by whitespace (format also acceptable by addr2line and atos): * `void dump_stacktrace(int fd)` * `void dump_stacktrace(HANDLE fd)` * `void dump_stacktrace(const char* filename)`
Yes, if that's in addition to the non-async-safe function that decodes the stacktrace in-process. Also, on Windows you might have to provide `wchar_t`-based function as well. I'm not sure if `char`-based one would be safe to use in a signal handler since narow character WinAPI perform character conversion internally.
* Provide an example with dumping callstack to a file and printing it on program restart
I would suggest an example of printing the stacktrace by a separate program. Printing the stacktrace upon restart seems like a contrived case.
My list of TODOs for the library: * Take a look at the llvm-symbolizer, atos and libdwraf; try hard to produce a non-forking solution
+many. Thanks for working on this.
On 12/22/2016 08:28 AM, Andrey Semashev wrote:
On 12/21/16 22:43, Antony Polukhin wrote:
Hi,
* reviewers wish to have a smaller size of the `stacktrace` class, want to avoid template parameter and have a std::vector<frames> like interface (this is async unsafe because vector does dynamic allocations).
Actually, that depends on the allocator. A stack-based allocator or an allocator backed by static storage could be async-safe, making `vector` async-safe as well.
I am not sure about a stack-based allocator as I'd expect the stack to be wiped out but a static-storage allocator sounds like an excellent idea to deal with many/all async-related issues. Could that be the default? I'd really like the default-deployment case to be as automatic as possible... so that I could use it without wrecking my brain and to make sure junior developers are not repelled by the complexity and actually use it.
...
* Provide async-safe functions that dump addresses separated by whitespace (format also acceptable by addr2line and atos): * `void dump_stacktrace(int fd)` * `void dump_stacktrace(HANDLE fd)` * `void dump_stacktrace(const char* filename)`
Yes, if that's in addition to the non-async-safe function that decodes the stacktrace in-process.
Yes, please. "Decoding the stacktrace in-process" seems very desirable.
...
* Provide an example with dumping callstack to a file and printing it on program restart
I would suggest an example of printing the stacktrace by a separate program. Printing the stacktrace upon restart seems like a contrived case.
Printing the stacktrace upon restart does seem like unusual that might take some getting used to. Still, I personally find quite an attractive alternative to a separate/additional/extra program necessary to retrieve the info.
On Thu, Dec 22, 2016 at 2:09 AM, Vladimir Batov
On 12/22/2016 08:28 AM, Andrey Semashev wrote:
On 12/21/16 22:43, Antony Polukhin wrote:
* reviewers wish to have a smaller size of the `stacktrace` class, want to avoid template parameter and have a std::vector<frames> like interface (this is async unsafe because vector does dynamic allocations).
Actually, that depends on the allocator. A stack-based allocator or an allocator backed by static storage could be async-safe, making `vector` async-safe as well.
I am not sure about a stack-based allocator as I'd expect the stack to be wiped out but a static-storage allocator sounds like an excellent idea to deal with many/all async-related issues.
The idea is that the stack-based allocator is used in a signal handler. The stacktrace created in that context cannot escape the signal handler and can be used e.g. to save the backtrace to a file. The static storage fits that use case as well, although care must be taken to avoid concurrency issues. Perhaps, a thread-specific static storage would be an even better alternative.
Could that be the default?
I don't think that would make a good default. After all, when you create a stacktrace in the normal code you most likely want it to be independent from the current stack frame or global state.
On 12/22/2016 10:21 AM, Andrey Semashev wrote:
...
The idea is that the stack-based allocator is used in a signal handler. The stacktrace created in that context cannot escape the signal handler and can be used e.g. to save the backtrace to a file.
I hate to sound like a broken record but every time I read "to a file" I get concerned as it does not seem to fit our deployment case. Airline scheduling and disruption management. There are people on call 24/7 to address the situations when our s/w crashes. The operators have no issues/difficulties with the logs and, in fact, they send us those automatically without asking. It is really a stretch expecting them to find, handle, copy files or to run an extra application. Probably doable but I am not convinced. So, retrieving such a file from their secured system will be a royal pain. I might have missed that but dumping the textual stack-trace to the log is still on the cards, right?
The static storage fits that use case as well, although care must be taken to avoid concurrency issues. Perhaps, a thread-specific static storage would be an even better alternative.
Yes, thread_local certainly seems in order... Although I am under impression that POSIX signals are delivered to only one thread and, when that happens, the other threads are stopped. If so, then no need for thread_local... But I am admittedly hazy on handling signals in MT environment.
On 12/22/16 23:06, Vladimir Batov wrote:
On 12/22/2016 10:21 AM, Andrey Semashev wrote:
The idea is that the stack-based allocator is used in a signal handler. The stacktrace created in that context cannot escape the signal handler and can be used e.g. to save the backtrace to a file.
I hate to sound like a broken record but every time I read "to a file" I get concerned as it does not seem to fit our deployment case. Airline scheduling and disruption management. There are people on call 24/7 to address the situations when our s/w crashes. The operators have no issues/difficulties with the logs and, in fact, they send us those automatically without asking. It is really a stretch expecting them to find, handle, copy files or to run an extra application. Probably doable but I am not convinced. So, retrieving such a file from their secured system will be a royal pain. I might have missed that but dumping the textual stack-trace to the log is still on the cards, right?
Having a decoded stacktrace as a result of signal would be ideal. But apparently that is not possible without either forking the process and decoding the stacktrace in the child, or dumping undecoded stacktrace to a file so that it can be later processed by another application. The more I think of it, the less Boost.Stacktrace seems suited for the crash handler case. It doesn't load debug symbols, it doesn't allow dumping all threads in the crashed application. Really, I can't see myself wanting to use Boost.Stacktrace instead of gdb to create crash reports. Maybe the library should just focus on the use in general contexts instead of signal handlers. There's too much missing otherwise.
The static storage fits that use case as well, although care must be taken to avoid concurrency issues. Perhaps, a thread-specific static storage would be an even better alternative.
Yes, thread_local certainly seems in order... Although I am under impression that POSIX signals are delivered to only one thread and, when that happens, the other threads are stopped. If so, then no need for thread_local... But I am admittedly hazy on handling signals in MT environment.
Generally, that depends on how you set up signal handling. But you definitely want to process crash signals in the offending thread to get its stacktrace, which means the signal handler can be called in multiple threads.
On 2016-12-23 09:03, Andrey Semashev wrote:
On 12/22/16 23:06, Vladimir Batov wrote:
On 12/22/2016 10:21 AM, Andrey Semashev wrote:
The idea is that the stack-based allocator is used in a signal handler. The stacktrace created in that context cannot escape the signal handler and can be used e.g. to save the backtrace to a file.
I hate to sound like a broken record but every time I read "to a file" I get concerned as it does not seem to fit our deployment case. Airline scheduling and disruption management. There are people on call 24/7 to address the situations when our s/w crashes. The operators have no issues/difficulties with the logs and, in fact, they send us those automatically without asking. It is really a stretch expecting them to find, handle, copy files or to run an extra application. Probably doable but I am not convinced. So, retrieving such a file from their secured system will be a royal pain. I might have missed that but dumping the textual stack-trace to the log is still on the cards, right?
Having a decoded stacktrace as a result of signal would be ideal. But apparently that is not possible without either forking the process and decoding the stacktrace in the child, or dumping undecoded stacktrace to a file so that it can be later processed by another application.
Maybe I panicked too early. If it has to be as described above, then we'll probably have our users to run a script running the actual application plus that additional dump-decoding application to have the textual stack-trace as the output. GDB can do that for un-stripped application. If you lib can do that for a stripped app, then it'll be a plus.
The more I think of it, the less Boost.Stacktrace seems suited for the crash handler case. It doesn't load debug symbols, it doesn't allow dumping all threads in the crashed application. Really, I can't see myself wanting to use Boost.Stacktrace instead of gdb to create crash reports.
Even though I do not see myself wanting "all threads in the crashed application" I do tend to agree that it'll be hard/impossible to beat gdb to create crash reports. However, I readily admit I am not an expert in the area.
Maybe the library should just focus on the use in general contexts instead of signal handlers. There's too much missing otherwise.
That's, in fact, how we use backtrace(), backtrace_symbols() -- when I check pre/post conditions and realize that things have been messed up. Then I log the stack-trace with some more info and exit/die. It's not a replacement for gdb-based core analysis. It's a part of a quick what-went-wrong message which is often sufficient to nail the problem and fix it without getting the core and analyzing it.
The static storage fits that use case as well, although care must be taken to avoid concurrency issues. Perhaps, a thread-specific static storage would be an even better alternative.
Yes, thread_local certainly seems in order... Although I am under impression that POSIX signals are delivered to only one thread and, when that happens, the other threads are stopped. If so, then no need for thread_local... But I am admittedly hazy on handling signals in MT environment.
Generally, that depends on how you set up signal handling. But you definitely want to process crash signals in the offending thread to get its stacktrace, which means the signal handler can be called in multiple threads.
Yes, the signal handler can be called in multiple threads. However, I was/am under impression that while inside a signal handler you do not need to serialize access to some global memory as there won't be another signal handler running in another thread. I might be wrong though.
2016-12-22 23:06 GMT+03:00 Vladimir Batov
On 12/22/2016 10:21 AM, Andrey Semashev wrote:
...
The idea is that the stack-based allocator is used in a signal handler. The stacktrace created in that context cannot escape the signal handler and can be used e.g. to save the backtrace to a file.
I hate to sound like a broken record but every time I read "to a file" I get concerned as it does not seem to fit our deployment case. Airline scheduling and disruption management. There are people on call 24/7 to address the situations when our s/w crashes. The operators have no issues/difficulties with the logs and, in fact, they send us those automatically without asking. It is really a stretch expecting them to find, handle, copy files or to run an extra application. Probably doable but I am not convinced. So, retrieving such a file from their secured system will be a royal pain. I might have missed that but dumping the textual stack-trace to the log is still on the cards, right?
Actually, I think that Stacktrace must perfectly fit your needs. Here's how I understand your situation: you've got a big application, that runs 24/7 and is automatically restarted after a crash; a support team of non-developers that monitor it and send logs. It would be a disaster if your application hangs, so you need an async-signal-safe code. Sending corefiles is hard - application is big and support team is not trained to do that. Log files with a stacktrace would significantly help you in debugging. Here's how you can use the stacktrace: dump_stacktrace("/staktrace.txt"); // in signal handler, async signal safe ... // in main() if (filesystem::exists("/staktrace.txt")) { std::ifstream ifs("/staktrace.txt"); } -- Best regards, Antony Polukhin
Sorry, accidentally send the unfinished previous mail. Here's the full version:
2016-12-22 23:06 GMT+03:00 Vladimir Batov
On 12/22/2016 10:21 AM, Andrey Semashev wrote:
...
The idea is that the stack-based allocator is used in a signal handler. The stacktrace created in that context cannot escape the signal handler and can be used e.g. to save the backtrace to a file.
I hate to sound like a broken record but every time I read "to a file" I get concerned as it does not seem to fit our deployment case. Airline scheduling and disruption management. There are people on call 24/7 to address the situations when our s/w crashes. The operators have no issues/difficulties with the logs and, in fact, they send us those automatically without asking. It is really a stretch expecting them to find, handle, copy files or to run an extra application. Probably doable but I am not convinced. So, retrieving such a file from their secured system will be a royal pain. I might have missed that but dumping the textual stack-trace to the log is still on the cards, right?
Actually, I think that Stacktrace must perfectly fit your needs. Here's how I understand your situation: you've got a big application, that runs 24/7 and is automatically restarted after a crash; a support team of non-developers that monitor it and send logs. It would be a disaster if your application hangs, so you need an async-signal-safe code. Sending corefiles is hard - application is big and support team is not trained to do that. Log files with a stacktrace would significantly help you in debugging. Here's how you can use the stacktrace: dump_stacktrace("/staktrace.txt"); // in signal handler, async signal safe ... // in main() if (filesystem::exists("/staktrace.txt")) { std::ifstream ifs("/staktrace.txt"); ifs >> stacktrace_variable; // pseudo code, have not thought it fully through log << stacktrace; // outputting stacktrace of previous crash with all the function names and source lines ifs.close(); filesystem::remove("/staktrace.txt"); } This will give you the stacktrace of the previous crash in 100% safe manner that avoids hangs without any manual work from the support team. -- Best regards, Antony Polukhin
Antony, Thank you for your reply and patience. Much appreciated. Your description (below) of our deployment environment is 90% right on the money... the remaining 10% are merely minor details. Very impressive! You are not developing a MindReader library, are you? :-) You won't believe how much operators hate/shun binary files... anything non-textual that they can't read... it's irrational but ultimately they call the shots... if we want their business of course. :-) And your code snippet is very straightforward, makes sense and pretty much what I need. Many thanks. I already voted for the inclusion of your lib... Can I now vote again with my other hand please? :-) I hope all goes well for your library and you personally. Can't wait to see you lib in Boost. On 2016-12-24 06:08, Antony Polukhin wrote:
2016-12-22 23:06 GMT+03:00 Vladimir Batov
: On 12/22/2016 10:21 AM, Andrey Semashev wrote:
... to save the backtrace to a file.
... I might have missed that but dumping the textual stack-trace to the log is still on the cards, right?
Actually, I think that Stacktrace must perfectly fit your needs. Here's how I understand your situation: you've got a big application, that runs 24/7 and is automatically restarted after a crash; a support team of non-developers that monitor it and send logs. It would be a disaster if your application hangs, so you need an async-signal-safe code. Sending corefiles is hard - application is big and support team is not trained to do that. Log files with a stacktrace would significantly help you in debugging.
Here's how you can use the stacktrace:
dump_stacktrace("/staktrace.txt"); // in signal handler, async signal safe
... // in main() if (filesystem::exists("/staktrace.txt")) { std::ifstream ifs("/staktrace.txt"); ifs >> stacktrace_variable; // pseudo code, have not thought it fully through log << stacktrace; // outputting stacktrace of previous crash with all the function names and source lines ifs.close(); filesystem::remove("/staktrace.txt"); }
This will give you the stacktrace of the previous crash in 100% safe manner that avoids hangs without any manual work from the support team.
* Change `basic_stacktrace
` to `basic_stacktrace >` and make it's constructor async-unsafe and add additional runtime parameter for stack depth.
You could also consider using something like boost::container::pmr::polymorphic_allocator to avoid making basic_stacktrace template at all.
participants (5)
-
Andrey Semashev
-
Antony Polukhin
-
Artyom Tokmakov
-
Peter Dimov
-
Vladimir Batov