I finally found the root cause, it happened in the binder kernel.
For now, I discovered two reasons for what can cause a DeadObjectException to be thrown in BroadcastQueue and therafter a RemoteServiceException in ActivityThread in the app:
- There are no more asynchronous space to execute the binder transaction when AMS sends a one-way binder call to ActivityThread in order to trigger BroadcastReceiver.onReceive.
Related code shown below:
kernel/msm-4.4/drivers/android/binder_alloc.c
290 if (is_async &&
291 alloc->free_async_space < size + sizeof(struct binder_buffer)) {
292 binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
293 "%d: binder_alloc_buf size %zd failed, no async space left
",
294 alloc->pid, size);
295 eret = ERR_PTR(-ENOSPC);
296 goto error_unlock;
297 }
Therefore, this will not "end up destabilizing the system". It will only influences the application itself.
- The user application had been force closed because BroadcastQueue send scheduleCrash binder call to ActivityThread. The root cause of this problem is that there are no binder buffer in the application side because some binder threads occupy most of it.
The bug can be triggered with the following steps:
- Process1 sends large data (e.g. 980kB) to Process2, the Process2 need sleep for 30 seconds, and the large binder buffer will not be released.
- Process1 sends a broadcast to Process2, consisting of e.g. 50kB data. That would go beyond the make the buffer capacity of 1016kB, since 980kB + 50kB is larger than the buffer capacity.
- BroadcastQueue will throw a DeadObjectException and then pass scheduleCrash to ActivityThread in the application side.
Here is the code:
kernel/msm-4.4/drivers/android/binder_alloc.c
315 if (best_fit == NULL) {
...
341 pr_err("%d: binder_alloc_buf size %zd failed, no address space
",
342 alloc->pid, size);
343 pr_err("allocated: %zd (num: %zd largest: %zd), free: %zd (num: %zd largest: %zd)
",
344 total_alloc_size, allocated_buffers, largest_alloc_size,
345 total_free_size, free_buffers, largest_free_size);
346 eret = ERR_PTR(-ENOSPC);
347 goto error_unlock;
348 }
In conclusion, DeadObjectException can be thrown even if the application process haven't died.
The root cause is most likely because of full binder buffer for the application and does not influence the system.
So I think it is not necessary to make the application crash after catching a DeadObjectException in BroadcastQueue.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…