'Detecting out of memory errors

I would like to provide my system with a way of detecting whether out of memory exception has occurred or not. The aim for this exercise is to expose this flag through JMX and act correspondingly (e.g. by configuring a relevant alert on the monitoring system), as otherwise these errors sit unnoticed for days.

Naive approach for this would be to set an uncaught exception handler for every thread and check whether the raised exception is instance of OutOfMemoryError and set a relevant flag. However, this approach isn't realistic for the following reasons:

  • The exception can occur anywhere, including 3rd party libraries. There is nothing I can do to prevent them catching Throwable and keeping it for themselves.
  • Libraries can spawn their own threads and I have no way of enforcing uncaught exception handlers for these threads.

One of possible scenarios I see is bytecode manipulation (e.g. attaching some sort of aspect on top of OutOfMemoryError), however I am not sure if that's right approach or whether this is doable in general.

We have -XX:+HeapDumpOnOutOfMemoryError enabled, but I don't see this as a solution for this problem as it was designed for something else - and it provides no Java callback when this happens.

Has anyone done this? How would you solve it or suggest solving it? Any ideas are welcome.



Solution 1:[1]

You could use an out of memory warning system; this OutOfMemoryError Warning System can be an inspiration. You could configure a listener which is invoked after a certain memory threshold ( say 80%) is breached - you can use this invocation to start taking corrective measures.

We use something similar, where we suspend the component's service when the memory threshold of the component reaches 80% and start the clean up action; the component comes back only when the used memory comes below a another configurable value threshold.

Solution 2:[2]

There is an article based on the post that Scorpion has already given a link to.

The technique is again based on using MemoryPoolMXBean and subscribing to the "memory threshold exceeded" event, but it's slightly different from what was described in original post.

Author states that when you subscribe for the plain "memory threshold exceeded" event, there is a possibility of "false alarm". Imagine a situation when the memory consumption is above the threshold, but there will be a garbage collection performed soon and a lot of the memory is freed after that. In fact that situation is quite common in real world applications.

Fortunately, there is another threshold, "collection usage threshold", and a corresponding event, which is fired based on memory consumption right after garbage collection. When you receive that event, you can be much more confident you're running out of memory.

Solution 3:[3]

We have -XX:+HeapDumpOnOutOfMemoryError enabled, but I don't see this as a solution for this problem as it was designed for something else - and it provides no Java callback when this happens.

This flag should be all that you need. Set the output directory of the resulting heap dump file in some known location that you check regularly. Having a callback would be of no use to you. If you are out of memory, you can't guarantee that the callback code has enough memory to execute! All you can do is collect the data and use an external program to analyze why you ran out of memory. Any attempt at recovering in process can create bigger problems.

Bytecode instrumentation is possible - but hard. HPjmeter's monitoring tool has the ability to predict future OOM's (with caveats) -- but only on HP-UX/Itanium based systems. You could dedicate a daemon thread to calculating used memory in process and trigger an alert when this is exceeded, but really you're not solving the problem.

Solution 4:[4]

You can catch any and all uncaught exceptions with the static Thread.setDefaultUncaughtExceptionHandler. Of course, it doesn't help if someone is catching all Throwables. (I don't think anything will, though with an OOME I'd suspect you'd get a cascading effect until something outside the offending try block blew up.) Hopefully the thread would have released enough memory for the exception handler to work; OOM errors do tend to multiply as you try to deal with them.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gray
Solution 2
Solution 3
Solution 4 RalphChapin