system_server进程管理着Java世界的服务,主要分为boot服务、核心服务以及其他服务。开机时由init进程拉起zygote,最终走到java世界的zygote。后续所有的java进程都是由zygote直接或间接孵化而出,而system_server就是zygote孵化出的第一个进程。这个流程可以查看前面的Java进程祖先-zygote服务,后续其他服务就是由system_server启动的。
源码基于Android 13
system_server
它是zygote孵化出的第一个进程,当system_server被fork出之后,zygote就不再直接fork其他进程了,而是进入循环阻塞状态,等待socket的消息来决定什么时候fork新进程。而system_server进程启动后,就走到其主函数中开始了它自己的流程。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
|
public final class SystemServer implements Dumpable { private final SystemServerDumper mDumper = new SystemServerDumper(); public static void main(String[] args) { new SystemServer().run(); } ... private void run() { try { t.traceBegin("InitBeforeStartServices"); ... Looper.prepareMainLooper(); ... createSystemContext(); ... ServiceManager.addService("system_server_dumper", mDumper); mDumper.addDumpable(this);
mSystemServiceManager = new SystemServiceManager(mSystemContext); mSystemServiceManager.setStartInfo(mRuntimeRestart, mRuntimeStartElapsedTime, mRuntimeStartUptime); mDumper.addDumpable(mSystemServiceManager); LocalServices.addService(SystemServiceManager.class, mSystemServiceManager); SystemServerInitThreadPool tp = SystemServerInitThreadPool.start(); mDumper.addDumpable(tp); } finally { t.traceEnd(); }
try { startBootstrapServices(t); startCoreServices(t); startOtherServices(t); startApexServices(t); } catch (Throwable ex) { throw ex; } finally { t.traceEnd(); } ... Looper.loop(); throw new RuntimeException("Main thread loop unexpectedly exited"); } }
|
整个system_server进程也是通过Looper机制进入循环等待的,关于Looper的可以查看Handler从Java到Native。它在启动中做了很多的操作,其中几乎都是我们需要关注的,如创建上下文以及注册dump等,我们一点一点的查看。
注册dump
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
|
public final class SystemServer implements Dumpable { ... private final class SystemServerDumper extends Binder {
@GuardedBy("mDumpables") private final ArrayMap<String, Dumpable> mDumpables = new ArrayMap<>(4);
@Override protected void dump(FileDescriptor fd, PrintWriter pw, String[] args) { final boolean hasArgs = args != null && args.length > 0;
synchronized (mDumpables) { if (hasArgs && "--list".equals(args[0])) { final int dumpablesSize = mDumpables.size(); for (int i = 0; i < dumpablesSize; i++) { pw.println(mDumpables.keyAt(i)); } return; }
if (hasArgs && "--name".equals(args[0])) { if (args.length < 2) { pw.println("Must pass at least one argument to --name"); return; } final String name = args[1]; final Dumpable dumpable = mDumpables.get(name); if (dumpable == null) { pw.printf("No dummpable named %s\n", name); return; }
try (IndentingPrintWriter ipw = new IndentingPrintWriter(pw, " ")) { final String[] actualArgs = Arrays.copyOfRange(args, 2, args.length); dumpable.dump(ipw, actualArgs); } return; }
final int dumpablesSize = mDumpables.size(); try (IndentingPrintWriter ipw = new IndentingPrintWriter(pw, " ")) { for (int i = 0; i < dumpablesSize; i++) { final Dumpable dumpable = mDumpables.valueAt(i); ipw.printf("%s:\n", dumpable.getDumpableName()); ipw.increaseIndent(); dumpable.dump(ipw, args); ipw.decreaseIndent(); ipw.println(); } } } }
private void addDumpable(@NonNull Dumpable dumpable) { synchronized (mDumpables) { mDumpables.put(dumpable.getDumpableName(), dumpable); } } } }
|
dump服务是有多个的,用于我们的命令adb shell dump,后面跟的就是dump服务的名字,我们可以从dump服务中查询各种信息。如在system_server进程启动中,注册的就是名为system_server_dumper的一个服务,从类中可以看到它是SystemServer的内部类,但是它继承自Binder,也就是说它是一个Binder服务,是可以注册到ServiceManager中的。
1 2 3 4 5 6
|
public final class ServiceManager { ... }
|
这里的ServiceManager是封装了native层的ServiceManager,使得我们可以直接使用而不用关注native层是如何实现的。这里主要就是通过ServiceManager注册了system_server_dumper服务就结束了。
SystemServiceManager
在注册了system_server_dumper后,紧接着又创建了一个SystemServiceManager,并加入到了本地的服务中。注意这里只是加入到了本地的服务列表中,并没有通过ServiceManager去注册服务,因为它并不是Binder,因此也就无法注册。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| public final class SystemServiceManager implements Dumpable { ... }
public final class LocalServices { private LocalServices() {} private static final ArrayMap<Class<?>, Object> sLocalServiceObjects = new ArrayMap<Class<?>, Object>();
@SuppressWarnings("unchecked") public static <T> T getService(Class<T> type) { synchronized (sLocalServiceObjects) { return (T) sLocalServiceObjects.get(type); } }
public static <T> void addService(Class<T> type, T service) { synchronized (sLocalServiceObjects) { if (sLocalServiceObjects.containsKey(type)) { throw new IllegalStateException("Overriding service registration"); } sLocalServiceObjects.put(type, service); } }
@VisibleForTesting public static <T> void removeServiceForTest(Class<T> type) { synchronized (sLocalServiceObjects) { sLocalServiceObjects.remove(type); } } }
|
因为不是远程的Binder服务,因此无法注册到ServiceManager中,所以这里注册到LocalServices中。这里的LocalServices维护了一个静态map用于存储本地服务,注意这里的**SystemServiceManager服务的key是SystemServer.class**。该服务的主要作用就是维护其他服务,以及启动其他服务等,这里后面我们再继续查看。
批量启动服务
最后也就是system_server的最主要的流程了,就是启动各种服务,注意这里的服务并不是Binder服务,而是Java层中定义的服务,它们并不能注册到ServiceManager中的。
bootstrap服务
这里启动的是系统启动引导服务,是非常重要的一系列服务。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
|
public final class SystemServer implements Dumpable {
private void startBootstrapServices(@NonNull TimingsTraceAndSlog t) { t.traceBegin("startBootstrapServices");
final Watchdog watchdog = Watchdog.getInstance(); watchdog.start(); ... ActivityTaskManagerService atm = mSystemServiceManager.startService( ActivityTaskManagerService.Lifecycle.class).getService(); mActivityManagerService = ActivityManagerService.Lifecycle.startService( mSystemServiceManager, atm); mActivityManagerService.setSystemServiceManager(mSystemServiceManager); ... mPowerManagerService = mSystemServiceManager.startService(PowerManagerService.class); ... mActivityManagerService.initPowerManagement(); ... IPackageManager iPackageManager; t.traceBegin("StartPackageManagerService"); try { Watchdog.getInstance().pauseWatchingCurrentThread("packagemanagermain"); Pair<PackageManagerService, IPackageManager> pmsPair = PackageManagerService.main( mSystemContext, installer, domainVerificationService, mFactoryTestMode != FactoryTest.FACTORY_TEST_OFF, mOnlyCore); mPackageManagerService = pmsPair.first; iPackageManager = pmsPair.second; } finally { Watchdog.getInstance().resumeWatchingCurrentThread("packagemanagermain"); } ... } }
|
这里启动和很多的服务,也有很多我们常见的一些服务,这里先不去关注这些服务具体做了什么,我们先看两点,一是如何启动的服务,另外就是看门狗是怎么保证这些服务的正常运行的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| public final class SystemServiceManager implements Dumpable { public <T extends SystemService> T startService(Class<T> serviceClass) { try { final String name = serviceClass.getName(); final T service; try { Constructor<T> constructor = serviceClass.getConstructor(Context.class); service = constructor.newInstance(mContext); }...
startService(service); return service; } finally { Trace.traceEnd(Trace.TRACE_TAG_SYSTEM_SERVER); } } public void startService(@NonNull final SystemService service) { String className = service.getClass().getName(); if (mServiceClassnames.contains(className)) { Slog.i(TAG, "Not starting an already started service " + className); return; } mServiceClassnames.add(className); mServices.add(service);
long time = SystemClock.elapsedRealtime(); try { service.onStart(); } catch (RuntimeException ex) { throw new RuntimeException("Failed to start service " + service.getClass().getName() + ": onStart threw an exception", ex); } warnIfTooLong(SystemClock.elapsedRealtime() - time, service, "onStart"); } }
|
因为这些服务都是继承自SystemService的,所以启动服务就是通过class反射调用构造方法,获取到实例,并加入到本地的一个集合中保存用于避免重复启动服务,而实际的启动服务就是执行其onStart方法而已,因此,对应的服务应该在onStart中处理自己的逻辑,完成服务的启动。
WatchDog
WatchDog看门狗是用于保障服务的正常运行的,它在启动boot服务的最开始就已经通过start启动了,它本身也可以算是一个独立运行的服务或线程,其他服务如果想要接入的话,则需要在其onStart中将其自身添加进来。如下面的下面我们用PowerMS作为示例查看下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| public final class PowerManagerService extends SystemService implements Watchdog.Monitor { @Override public void onStart() { publishBinderService(Context.POWER_SERVICE, mBinderService, false, DUMP_FLAG_PRIORITY_DEFAULT | DUMP_FLAG_PRIORITY_CRITICAL); publishLocalService(PowerManagerInternal.class, mLocalService); Watchdog.getInstance().addMonitor(this); Watchdog.getInstance().addThread(mHandler); } @Override public void monitor() { synchronized (mLock) { } } }
|
我们可以看到,在PowerManagerService中,继承自WatchDog.Monitor,然后在启动时先添加了monitor,再添加了一个线程,该线程就是Handler线程。我们一步一步往下看,先是添加Monitor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
|
public class Watchdog implements Dumpable {
private final HandlerChecker mMonitorChecker; private final ArrayList<HandlerCheckerAndTimeout> mHandlerCheckers = new ArrayList<>(); private Watchdog() { mThread = new Thread(this::run, "watchdog"); mMonitorChecker = new HandlerChecker(FgThread.getHandler(),"foreground thread"); mHandlerCheckers.add(withDefaultTimeout(mMonitorChecker)); ... } public void addMonitor(Monitor monitor) { synchronized (mLock) { mMonitorChecker.addMonitorLocked(monitor); } } public void addThread(Handler thread) { synchronized (mLock) { final String name = thread.getLooper().getThread().getName(); mHandlerCheckers.add(withDefaultTimeout(new HandlerChecker(thread, name))); } } }
|
从这里可以看出,在WatchDog中有一个集合常量,里面存储的全都是需要检测的线程,然后在构造方法中会提前创建一个检测线程即前台线程,当addMonitor实际就是添加到前台线程检测中,而addThread则是添加到检测集合中,与前台线程是同一级别的。看下Ha的结构:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
|
public class Watchdog implements Dumpable {
public final class HandlerChecker implements Runnable { private final Handler mHandler; private final String mName; private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>(); private final ArrayList<Monitor> mMonitorQueue = new ArrayList<Monitor>(); private long mWaitMax; private boolean mCompleted; private Monitor mCurrentMonitor; private long mStartTime; private int mPauseCount;
HandlerChecker(Handler handler, String name) { mHandler = handler; mName = name; mCompleted = true; } } }
|
HandlerChecker是WatchDog的一个内部类,其内也有一个集合用于存放各种Monitor,实际上检测就是去执行各个HandlerChecker中的Monitor集合。至于我们通过addThread添加进来的会被包装成HandlerCheckerAndTimeout实际就是多了一个自定义的超时时间而已。

接下来就是WatchDog的启动了,注意这里的启动是在服务启动之前就已经启动了的,然后服务启动时会将自身添加到WatchDog中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
|
public class Watchdog implements Dumpable { private static final long DEFAULT_TIMEOUT = DB ? 10 * 1000 : 60 * 1000; ... private void run() { boolean waitedHalf = false;
while (true) { List<HandlerChecker> blockedCheckers = Collections.emptyList(); String subject = ""; boolean allowRestart = true; int debuggerWasConnected = 0; boolean doWaitedHalfDump = false; final long watchdogTimeoutMillis = mWatchdogTimeoutMillis; final long checkIntervalMillis = watchdogTimeoutMillis / 2; final ArrayList<Integer> pids; synchronized (mLock) { long timeout = checkIntervalMillis; for (int i=0; i<mHandlerCheckers.size(); i++) { HandlerCheckerAndTimeout hc = mHandlerCheckers.get(i); hc.checker().scheduleCheckLocked(hc.customTimeoutMillis() .orElse(watchdogTimeoutMillis * Build.HW_TIMEOUT_MULTIPLIER)); } ... } } } }
|
从上面代码可以看出,WatchDog在运行时会遍历所有的Checker,然后执行其scheduleCheckLocked方法,然后再通过wait方法等待,我们先看下Checker做了什么:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
|
public class Watchdog implements Dumpable {
public final class HandlerChecker implements Runnable { public void scheduleCheckLocked(long handlerCheckerTimeoutMillis) { mWaitMax = handlerCheckerTimeoutMillis; if (mCompleted) { mMonitors.addAll(mMonitorQueue); mMonitorQueue.clear(); } if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) || (mPauseCount > 0)) { mCompleted = true; return; } if (!mCompleted) { return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); mHandler.postAtFrontOfQueue(this); } @Override public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (mLock) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); } synchronized (mLock) { mCompleted = true; mCurrentMonitor = null; } } } }
|
使Checker开始执行,最后实际是往其对应的Handler中发送消息,在其Handler的线程中执行它里面的Monitor。而在PowerMS中,它的monitor()方法什么都没做,只是获取锁再释放锁而已。其实默认情况下很多的服务的检测方法都是仅仅获取并是否锁来判断前台线程是否发生死锁的。
1 2 3 4 5 6 7 8 9 10
| public final class PowerManagerService extends SystemService implements Watchdog.Monitor { @Override public void monitor() { synchronized (mLock) { } } }
|
如果我们自己的服务,就可以在monitor中做一些自己的检测逻辑。继续回到WatchDog的run中往下看:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
|
public class Watchdog implements Dumpable { private int evaluateCheckerCompletionLocked() { int state = COMPLETED; for (int i=0; i<mHandlerCheckers.size(); i++) { HandlerChecker hc = mHandlerCheckers.get(i).checker(); state = Math.max(state, hc.getCompletionStateLocked()); } return state; } private void run() { boolean waitedHalf = false;
while (true) { ... synchronized (mLock) { ... long start = SystemClock.uptimeMillis(); while (timeout > 0) { try { mLock.wait(timeout); } catch (InterruptedException e) { Log.wtf(TAG, e); } timeout = checkIntervalMillis - (SystemClock.uptimeMillis() - start); } final int waitState = evaluateCheckerCompletionLocked(); if (waitState == COMPLETED) { waitedHalf = false; continue; } else if (waitState == WAITING) { continue; } else if (waitState == WAITED_HALF) { if (!waitedHalf) { waitedHalf = true; blockedCheckers = getCheckersWithStateLocked(WAITED_HALF); subject = describeCheckersLocked(blockedCheckers); pids = new ArrayList<>(mInterestingJavaPids); doWaitedHalfDump = true; } else { continue; } } else { blockedCheckers = getCheckersWithStateLocked(OVERDUE); subject = describeCheckersLocked(blockedCheckers); allowRestart = mAllowRestart; pids = new ArrayList<>(mInterestingJavaPids); } } logWatchog(doWaitedHalfDump, subject, pids); if (doWaitedHalfDump) { continue; }
IActivityController controller; synchronized (mLock) { controller = mController; } if (controller != null) { try { Binder.setDumpDisabled("Service dumps disabled due to hung system process."); int res = controller.systemNotResponding(subject); if (res >= 0) { Slog.i(TAG, "Activity controller requested to coninue to wait"); waitedHalf = false; continue; } } catch (RemoteException e) { } }
if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } if (debuggerWasConnected >= 2) { Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process"); } else if (debuggerWasConnected > 0) { Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process"); } else if (!allowRestart) { Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process"); } else { Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); Slog.w(TAG, "*** GOODBYE!"); if (!Build.IS_USER && isCrashLoopFound() && !WatchdogProperties.should_ignore_fatal_count().orElse(false)) { breakCrashLoop(); } Process.killProcess(Process.myPid()); System.exit(10); } waitedHalf = false; } } }
|
WatchDog设置超时时长为60秒,系统或卡顿,或阻塞,或死锁等超过了60秒(debug下是10秒)就会直接杀死system_server进程重启手机。整体流程:WatchDog触发添加到它内部的所有的Checker执行,而Checker又会向它内部的Handler发消息以执行它内部的Monitor集合,然后WatchDog等待30秒,查看是否有Checker未完成,如果已完成则再次循环触发检测,如果未完成并且阻塞了也未超过30秒则再次循环继续等待,如果未完成超过了30秒,则打印这些Checker的信息,如果未完成并且超过了60秒,则杀死进程并重启手机。
WatchDog检测的是添加的Handler线程是否阻塞,以及system_server的Handler线程以及运行在其线程上的对应的服务是否阻塞。
核心服务和其他服务
核心服务与bootstrap服务的逻辑差不多,都是通过SystemServiceManager启动的,区别就是核心服务不会加入到WatchDog中。其他服务startOtherServices中启动的是更低一级的服务,它更不会加入到WatchDog中了,如我们非常熟悉的WindowManagerService就属于其他服务,正常如果我们自定义服务的话,都可以加在其他服务中。
总结
到这里我们基本上看完了system_server进程,它作为zygote的第一个java进程,主要作用就是启动各种服务进程。它首先是注册了dump服务用于命令行进行调试,然后通过SystemServiceManager启动各种我们常见的服务,按照服务的重要性分类,其中bootstrap类型的服务最为重要,在这些服务上使用了WatchDog进行监测,一旦这些服务或者system_server卡死,就会直接杀死进程。所以,我们熟悉的AMS、PMS都是运行在system_server进程的。