Tracing "Application Not Responding" Errors Like A Boss

Tracing "Application Not Responding" Errors Like A Boss
Photo by amirali mirhashemian / Unsplash

If you’ve been an Android user or an Android developer long enough, you’ll have run into dialogs like this.  As a developer, it’s a sure-fire way to get yourself a one star review.

Why do you get these dialogs?  Because you’re doing some long-running operation on, or otherwise tying up, the main thread.  For a whole 35 minutes worth of info on the main thread, please do check out this talk from my Amex colleague Michael Bailey at Droidcon NYC.

I’ve noticed, though, that many developers aren’t aware that when your app trips an ANR, Android helpfully provides a lovely thread dump. Looking over it can tell you precisely what’s going on and what blocked the main thread.  It will make your life a ton easier when you’re fighting against pesky ANR conditions.

Let’s look at an example! The whole example app here is up on Github so you can run it on your own side and explore the ANR traces.

void triggerAnrWithLongOperation() {  
    try {
        Thread.sleep(30*1000);
    } catch (InterruptedException e) {

    }
}

This code here is a first-class ticket to ANR-ville. When you run it, pay close attention to your logcat:

02-28 12:07:20.418  1534  1539 I art     : Wrote stack traces to '/data/anr/traces.txt' 02-28 12:07:21.885  1534  1548 E ActivityManager: ANR in com.example.anrdemo (com.example.anrdemo/.MainActivity)  

The Android framework writes you a set of thread stack traces to local storage so you can debug your problem. Let’s get a look at what it says.

"main" prio=5 tid=1 Sleeping
 | group="main" sCount=1 dsCount=0 obj=0x7507a710 self=0x7fcbdd47ba00
 | sysTid=14659 nice=0 cgrp=default sched=0/0 handle=0x7fcbe0e00200
 | state=S schedstat=( 0 0 0 ) utm=6 stm=5 core=1 HZ=100
 | stack=0x7fff98589000-0x7fff9858b000 stackSize=8MB
 | held mutexes=
 at java.lang.Thread.sleep!(Native method)
 - sleeping on <0x04612486> (a java.lang.Object)
 at java.lang.Thread.sleep(Thread.java:1031)
 - locked <0x04612486> (a java.lang.Object)
 at java.lang.Thread.sleep(Thread.java:985)
 at com.example.anrdemo.MainActivity.triggerAnrWithLongRun(MainActivity.java:30)

So now we’ve found our main thread, and we’re sleeping, which is causing the problem.  We even get a stack trace that makes it easy to see where this is coming from.

Now obviously in the real-world our ANRs aren’t caused by simple sleep calls.  We may be accidentally running network operations on the main thread, or more insidiously, we may find ourselves in thread deadlock situations.  Dealing with deadlock due to buggy concurrent code is where the ANR traces really shine.

Let’s take a look at this method call.  Here’s some bad buggy code that illustrates a contrived deadlock situation:

void triggerDeadlock() {

    final CountDownLatch countDown = new CountDownLatch(1);

    HandlerThread handlerThread = new HandlerThread("Deadlocker");
    handlerThread.start();
    Handler handler = new Handler(handlerThread.getLooper());
    handler.post(new Runnable() {
        @Override
        public void run() {

            // this thread obtains a lock on the MainActivity instance and sits on it. That's bad.
            synchronized(MainActivity.this) {
                countDown.countDown();
                try {
                    Thread.sleep(30*1000);
                } catch (InterruptedException e) {
                }
            }
        }
    });

    // this runs on the main thread
    // await to make sure we don't try running this until the lock is 
    // obtained
    try {
        countDown.await();
    } catch (InterruptedException e) {
    }

    synchronized (this) {
        Log.v(TAG, "Hello world from the main thread");
    }
}

So we’ll spin off a second HandlerThread, which then obtains a lock on the MainActivity and sleeps.  Then, back in our main thread, we’ll try to obtain a lock on ourselves, which will end up having to wait for our HandlerThread’s sleep to finish.  Ultimately Android will trip an ANR before the sleep finishes.

Let’s pull open the trace:

"main" prio=5 tid=1 Blocked
 | group="main" sCount=1 dsCount=0 obj=0x7507a710 self=0x7fcbdd47ba00
 | sysTid=16815 nice=0 cgrp=default sched=0/0 handle=0x7fcbe0e00200
 | state=S schedstat=( 0 0 0 ) utm=8 stm=3 core=1 HZ=100
 | stack=0x7fff98589000-0x7fff9858b000 stackSize=8MB
 | held mutexes=
 at com.example.anrdemo.MainActivity.triggerDeadlock(MainActivity.java:60)
 - waiting to lock <0x068aba68> (a com.example.anrdemo.MainActivity) held by thread 14

So here in the main thread (thread ID #1), we’re waiting to obtain a lock on this object – our MainActivity.  The lock currently is being held by thread with ID #14 – let’s get a look at that.

"Deadlocker" prio=5 tid=14 Sleeping
 | group="main" sCount=1 dsCount=0 obj=0x12e98520 self=0x7fcbd4984800
 | sysTid=16837 nice=0 cgrp=default sched=0/0 handle=0x7fcbcc7bf440
 | state=S schedstat=( 0 0 0 ) utm=0 stm=0 core=0 HZ=100
 | stack=0x7fcbcc6bd000-0x7fcbcc6bf000 stackSize=1037KB
 | held mutexes=
 at java.lang.Thread.sleep!(Native method)
 - sleeping on <0x0bf3629d> (a java.lang.Object)
 at java.lang.Thread.sleep(Thread.java:1031)
 - locked <0x0bf3629d> (a java.lang.Object)
 at java.lang.Thread.sleep(Thread.java:985)
 at com.example.anrdemo.MainActivity$1.run(MainActivity.java:52)
 - locked <0x068aba68> (a com.example.anrdemo.MainActivity)
 at android.os.Handler.handleCallback(Handler.java:739)
 at android.os.Handler.dispatchMessage(Handler.java:95)
 at android.os.Looper.loop(Looper.java:148)
 at android.os.HandlerThread.run(HandlerThread.java:61)

So for starters, note that the name of the thread in the trace is “Deadlocker.” This is handy; when you instantiate your HandlerThread the name passed in the constructor will show up here in the trace.

So ultimately, by picking through the ANR trace, we’re able to find that our main thread is waiting to obtain a lock that’s held by the sleeping Deadlocker thread.

Using the automatically-generated thread traces has saved me a ton of time and headaches versus trying to profile the app or debug into ANR situations.  I hope this helps you, too!