| # What’s Up With Processes |
| |
| This is a transcript of [What's Up With |
| That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq) |
| Episode 8, a 2023 video discussion between [Sharon (yangsharon@chromium.org) |
| and Darin (darin@chromium.org)](https://www.youtube.com/watch?v=SD3cjzZl25I). |
| |
| The transcript was automatically generated by speech-to-text software. It may |
| contain minor errors. |
| |
| --- |
| |
| Chrome has a lot of process types. What is a process? What are all the types? |
| How do they work together? Today’s special guest to tell us more is Darin. |
| Darin is one of the founding members of the Chrome team, and wrote the initial |
| implementation of the multi-process architecture. |
| |
| Notes: |
| - https://docs.google.com/document/d/1uXF-ncJ98LWQMN7M3NA_2oYkVmW9Vzp0v-wkJaNpsDQ/edit |
| |
| Links: |
| - [Chrome comic](https://www.google.com/googlebooks/chrome/small_00.html) |
| - [What's Up With Mojo](https://www.youtube.com/watch?v=at_35qCGJPQ) |
| - [What's Up With Open Source](https://www.youtube.com/watch?v=zOr64ee7FV4) |
| - [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I) |
| - [Life of a Process](https://www.youtube.com/watch?v=5im7SGmJxnA) |
| - [Chrome Compositing](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/how_cc_works.md) |
| - [Site Isolation papers by Charlie](https://charlesreis.com/research/publications/) |
| |
| --- |
| |
| 00:00 SHARON: Hello, and welcome to "What's Up With That," the series that |
| demystifies all things Chrome. I'm your host, Sharon, and today, we're talking |
| about processes. There are so many process types in Chrome. How do they form |
| the multi-process architecture? What exactly is a process? Here to answer all |
| of that and more is today's special guest, Darin. Darin was one of the founding |
| members of the Chrome team and pretty much did the first implementation of the |
| multi-process architecture, so it is well-suited to answer all of this. Plus, |
| created the IPC channels that Chrome started with. If you want to learn more |
| about IPC and Mojo, check out the last episode with Daniel for lots more on |
| that. So hello. Welcome, Darin. Welcome to the show. Thanks for being here. |
| |
| 00:38 DARIN: Thank you. Great to be here. |
| |
| 00:38 SHARON: Yeah, cool. So first question, what is a process? |
| |
| 00:44 DARIN: Right, so process is the container in which applications run on |
| your system. Every process has both its own executing set of threads, but it |
| also has its own memory space. That way, processes have their own independent |
| memory, their own independent data, and their own independent execution. The |
| system is multitasking across all of the processes on the system. |
| |
| 01:13 SHARON: Cool. Chrome is basically an operating system that runs on top of |
| your operating system. So there probably are parallels between Chrome's |
| representation of a process and the actual operating system ones. So what are |
| the similarities and differences, and how do they interact? |
| |
| 01:30 DARIN: Well, yeah, I mean, you can talk about a lot of different things. |
| I mean, so Chrome is made up of multiple processes. We run different tasks in |
| different processes. That's done for multiple reasons. One is so that they can |
| run independently, so that there's performance benefits that come from the fact |
| that they're running independently. Back in the day, the original idea was that |
| it would allow us to take advantage of the operating system's preemptive |
| multitasking that it already has and to actually allow web pages to run |
| concurrently and to be managed just like any other concurrent task that the |
| operating system would manage. So that's the original idea there. And in that |
| way, this model of Chrome divided into multiple processes just allows the |
| Chrome itself and all of the tasks that it has to really take advantage of |
| multi-core systems so that if you have more computing power, if you have more |
| cores, you have more hyperthreading going on in your system, then it's possible |
| for more things to happen concurrently. And Chrome's workload can be spread out |
| that way because Chrome is broken into all of these different processes and all |
| of these different threads. In that way, it's taking advantage of and mirroring |
| the capabilities of the OS and providing that as a substrate for web and for |
| browser and for how all these things work. How Chrome then has to be similar is |
| that also, like an OS, Chrome has to manage all this stuff. And from simple |
| things like how much resource should a background tab be using, should its |
| timers be running when it's in the background, to much more complicated things |
| when you talk about even should a process stay alive or not. If you look at |
| Chrome OS where system resources can be so limited, it's necessary, or on |
| mobile, necessary to terminate some of those background processes to close some |
| of those tabs behind the scenes, even if the application makes it look like |
| those tabs are still open. So the level of management is a big part of - in |
| that way, it's being kind of like an OS. |
| |
| 03:42 SHARON: Is Chrome's representation of a process, are those generally |
| one-to-one with a system process, depending on which system you're on - |
| |
| 03:48 DARIN: Absolutely. |
| |
| 03:48 SHARON: or is that an abstraction layer? |
| |
| 03:55 DARIN: No, well, absolutely when we talk about a process in Chrome, we |
| mean an OS process. And so we might have multiple web pages being served by |
| that single renderer process. We do try to spread the load across multiple |
| processes, but we also independently decide how many processes to actually |
| create. And it can be based on - there could be good reasons from, like I said, |
| a performance perspective to having tabs assigned across multiple processes, |
| but there can also be good security properties, like letting the web pages be |
| allocated to different processes means that those web pages are not running in |
| the same process, meaning they're not running in the same address space. And |
| from a security perspective, that has really great properties because it means |
| if a web page is able to tickle a bug in the rendering engine in the V8 or in |
| part of Blink and somehow get a privilege escalation, like start to be able to |
| do things that JavaScript normally can't do, it's still going to be limited by |
| the capabilities of that process and what it has access to. And so if that |
| process has really only the data for the web page that was providing the |
| problematic JavaScript, well, it's not really getting access to anything it |
| didn't already have. And that's kind of the whole idea of process isolation and |
| sandboxing. And then on top of that, you limit the capabilities of that process |
| by really leveraging the OS process primitive and the kinds of restrictions and |
| capabilities that can be removed from that process to achieve an isolation for |
| web pages for an origin or for a set of web pages. I say set because we might |
| not want to allocate a process for every single tab or for every single origin |
| because that might just use up way too many system resources. So we have to be |
| thoughtful there, too. |
| |
| 05:50 SHARON: Yeah, so this is quite closely related to site isolation, which |
| isn't the topic of this video - maybe the next one. So terms that are used |
| often and sometimes interchangeably are multi-process architecture and process |
| model. So these aren't exactly the same thing, but I think can you explain the |
| difference between them and what each one is for? Because there are |
| similarities, but. |
| |
| 06:16 DARIN: Sure. I mean, I think to me, the phrase "process model," it's |
| talking about, what does a particular process represent, what does it do. And |
| then when I say multi-process architecture, I'm thinking of the whole thing. |
| It's all packaged up. It's a multi-process architecture to build a browser. At |
| the end of the day, user is hopefully not so aware of the fact that this is how |
| it's built. I mean, earlier on in Chrome's history, the Windows Task Manager |
| didn't do a very good job of grouping processes by their parent. And so if you |
| opened the Task Manager at the OS level, you'd see just a spew of processes |
| that Chrome was responsible for. And it could be a little disconcerting for |
| people. A little tangent, but now more modern versions of Windows, they do kind |
| of group it all to the parent task. And so it's a little easier and less sort |
| of in-your-face that Chrome is creating all these processes. But yeah, at the |
| end of the day, it's just the multi-process architecture is like that's the |
| embodiment of the whole thing. And we have these different process types that |
| make up that whole thing. There's the browser process, the main one, and then a |
| renderer process is the name we give to the processes responsible for running |
| web pages. And then we have a few other process types that are part of the |
| puzzle, a networking process, a GPU process, utility process, and occasionally, |
| in the lifespan of Chrome, other types of processes. We had plugin processes, |
| for example, when we were hosting Flash in Chrome. And the Native Client had |
| its own type of processes as well. So what's that all about? Really, I can go |
| into it if you want me to go into all the details there. But - |
| |
| 08:05 SHARON: Yeah, I think we'll run through - this is a, yeah, perfect segue. |
| We'll run through each of those process types you just mentioned and mention a |
| bit about what they do, how much privilege they have, maybe how many of them |
| there are because some of them, there's only one of. So I think it makes sense |
| to start with the browser process, which is the process and is often likened to |
| the kernel in an operating system. |
| |
| 08:30 DARIN: Yeah, so the browser process kernel operating system broker, these |
| are kind of good analogies for what the browser process's role is. So it's the |
| application process, the main one, that starts up initially, and it's the one |
| that hosts the whole UI of the app. And it's going to spawn these child |
| processes, the renderer processes, the GPU process, and so on, to help fulfill |
| its goals. So very early on, we started with this design where WebKit, the |
| rendering engine we were using from Apple, it could be built as a COM control |
| and register it on the system and load it as a DLL. And then in order to run |
| that in a child process, it was using HWNDs and all the standard Win32 isms to |
| do its job. And we started out by just literally trying to capture a bitmap |
| rendering of WebKit and send it over to the browser process where we could |
| present that bitmap. Actually, rewind even further. The very first version took |
| advantage of the fact that Windows supports having HWNDs hosted in different |
| processes and threads. And so we literally just took that HWND from WebKit and |
| that child process and stuck it into the window hierarchy of the browser |
| process. And we drew our browser UI around it, and WebKit was there, but it was |
| running in a different process. And if we ever needed to tell that process to |
| do something, we just send a WM user event postmessage to it. And that's |
| something Windows lets you do. So it felt like a very simple toy kind of way to |
| try it all out. A lot of limitations to that design. Pretty quickly, we |
| realized we didn't want to just be in that kind of setup, and we moved to |
| building our own IPC channel, a pipe, so that we could communicate and really |
| get to the point where WebKit's running there without an HWND, without its own |
| Win32 windowing constructs, but instead, it's just kind of an image generator. |
| And we take the image that it generates, the bitmap, send it over our IPC |
| channel to the browser process. And the browser process is where we have our |
| window hierarchy browser process. We display that bitmap browser process where |
| we collect user input and send it to the pipe to the renderer where we then |
| feed it into WebKit. |
| |
| 10:46 DARIN: That was the original architecture of Chrome. So in that world, |
| the browser process is your application process. It has all the UI. And it's |
| really like this glorified image viewer. And the renderer process is literally |
| just like it's running WebKit - now Blink. It's running the rendering engine, |
| and it's producing those images whenever. Like, an update occurs. A layout |
| occurs or some invalidation occurs. And we got a little fancy. It was producing |
| just the sub. It would know, oh, I really only have a small damage rect, so I |
| don't have to produce the whole image. I just produce a small part. And send |
| that over, and then we paint that into the part that the browser is retaining |
| an old image of. And it can update just that one part. And so that's a very |
| simple approach that we took when building this whole thing. And so those |
| render processes become very much just very simplistic in that they aren't |
| interacting with the rest of the OS in a very deep way. They are just taking |
| input events from this pipe and sending images back. When they need other |
| services like they need network access, instead of going straight to the |
| network from the renderer process, because we started to realize, hey, we might |
| want a sandbox and restrict those child processes, and also, we needed the |
| notion of cookie jar that was shared across all web pages, so that if you visit |
| GMail in one tab and visit GMail in another tab, you're still logged in, we |
| needed the network stack to be in a unified place. So it meant that not just |
| would we send images up to the browser, but now we would send network requests |
| to the browser. And the browser would respond with the network data. And as a |
| result, we started to go down this path of centralizing access to system |
| services and resources in the browser process. |
| |
| 12:44 DARIN: It's becoming therefore like a broker to the system that the |
| renderer now is unable to - not unable - it's asking the browser for everything |
| it needs. It's communicating to the browser to get access to all the different |
| resources. And that allowed us to then restrict the renderer process |
| considerably so that it doesn't even have access if it wanted to touch the file |
| system, to touch the network TCP/IP implementation or any system resources. So |
| the sandbox really is all about how we apply those restrictions, taking away |
| the capabilities of a windows process. So in the very early days, there was |
| just the browser process and renderer processes. And we would allow multiple |
| renderer processes to be created as tabs were opened. And we put some |
| restriction on the number of processes based on the amount of RAM that your |
| system would have, thinking that processes maybe have some inherent overhead, |
| which they do. Certainly, there's the overhead of the V8 heap that is allocated |
| once per process or once per isolate, if you're familiar with the details of |
| V8. And so, we didn't want to have so much of that kind of - so we thought |
| there was some limit to how many processes we should have. Later on, other |
| processes types started to emerge. The next one that came was the Plugin |
| process because in order to get YouTube to work back in 2006, you needed to |
| support Flash. And Flash has two modes - it did. It had a windowed mode and a |
| windowless mode. And the difference is whether it drew itself into an HWND or |
| if it would just produce a bitmap itself. But regardless of what mode it was |
| rendering in, it still wanted direct system access, like it wanted to touch the |
| file system. And so if we were going to run it in our browser, it can't run in |
| the renderer process. It has to run somewhere else. And so, yeah, in the frenzy |
| of, gee, wouldn't it be nice if we could have sandboxing, it was, how the heck |
| are we going to sandbox and isolate plugins? Because the way plugins integrated |
| with WebKit is that WebKit just directly called into them and said, hey, if |
| it's a windowless one, give me your bitmap. I'm going to include it in my |
| rendering. If it's a windowless one, it also means it's dependent on WebKit to |
| feed it events. And so, how does that work? So we ended up building a process |
| type called the Plugin process type for NPAPI plugins, Netscape-style plugins, |
| all stuff that doesn't exist anymore. It's wonderful. And NPAPI is this |
| interface that was once upon a time, I want to say, kind of, like - my head is |
| going to some unsavory words. It was kind of pooped out by somebody at Netscape |
| to make Acrobat Reader work over the weekend. And then it became a stable API. |
| And lots of regret and sadness probably followed, but as a result, things like |
| Flash were created, and web became very interesting in some ways. A wonderful |
| story about Flash, I think. |
| |
| 16:02 DARIN: But anyways, supporting that stuff meant dealing with some gnarly |
| frozen APIs and figuring out how to stitch all that together, and the renderer |
| process of WebKit would talk to something that wasn't actually in its process |
| that was - or, again, another IPC channel, running a whole other process. We |
| wanted plugins to still not run in our browser process, but to, instead, run in |
| their own process so that if they crashed, they wouldn't take down the whole |
| browser. And Flash and other plugins were notorious for crashing. So it was a |
| must that they run in their own process. But we figured they couldn't be |
| sandboxed as tightly as the renderer as WebKit because they already were |
| accessing the system in very deep ways. |
| |
| 16:55 SHARON: Cool, lots of - |
| |
| 16:55 DARIN: Lots more processes got added later, like the networking, the GPU |
| process, and NaCl. I can tell the story about those, too, if you're interested. |
| |
| 17:08 SHARON: Oh, sure. Yeah, let's hear it. |
| |
| 17:08 DARIN: OK, so 2009 era, I think, maybe 2010 - I don't know - somewhere |
| along the way, we started building Chrome for Android. And you might recall I |
| described how the renderer was really kind of a glorified image viewer, or the |
| browser, browser was sort of an image viewer and the renderer's job was to |
| produce a bitmap. And then we send it over to the browser, the browser would |
| draw the bitmap. Mobile systems were not going to work very well if this is the |
| way the drawing was going to work. If you think about how scrolling works or |
| worked back then, scrolling a web page back then meant telling the computer to |
| please memmove all the pixels, and then to draw another bitmap where pixels are |
| not existing yet and need to be drawn. So you do a memmove followed by a |
| memcpy. And so this is how original Chrome was built. If you were scrolling, it |
| would be, oh, we need to shift pixels, and here's the bitmap. We need to stick |
| in the part that's exposed. Do that all quickly, and do it over and over again. |
| And that kind of operation is just not good if your goal is like nice |
| responsive scrolling on a touch screen. Instead, the way mobile systems were |
| built is using GPU rendering and compositing engines powered by GPUs, so that, |
| instead, you are offloading a lot of that work to the GPU. So it was necessary |
| to restructure Chrome's rendering pipeline for mobile, at least. But because we |
| were doing that, we can also take advantage of it on desktop. Meanwhile, we |
| were also on desktop starting to invent things like WebGL. Initially, WebGL, |
| the precursor to that was this plugin called O3D, which is a 3D graphics plugin |
| using the wonderful plugin APIs that I talked about before. But it provided |
| this way to have 3D graphics scenes and build immersive kind of 3D content. |
| That team, at some point, switched their sights on how to make that a standard |
| through WebGL. Wonderful stories around that. But it also entailed figuring out |
| how to do OpenGL, essentially, because WebGL was just OpenGL ES, and how to do |
| that from a renderer, from that blink child process, how to do it there. And |
| really, that meant that, OK, this process is going to be - these sandbox |
| renderers are going to be generating a stream of GL commands. Where do they go? |
| What do we do with that? And also, we know that it's possible to write shaders |
| and possible to write GPU commands that can really wreck - can cause havoc, can |
| be problematic, can cause the system to crash your process. So we don't want |
| that happening in the browser process because we want the browser process to |
| stay up so it can [INAUDIBLE] the manager. |
| |
| 20:21 DARIN: So the GPU process was born. This will be the process that |
| actually talks to the OpenGL driver or DirectX under the hood via ANGLE on |
| Windows. And so now, we set up another pipe from the renderer over to the GPU |
| process, and the stream of GL commands are being sent over there. And over |
| there, it's talking to the driver. And if you sent something bad, driver is |
| going to say no bueno and crash your process. And we would find that the |
| browser would see the GPU process died, and it would maybe give you a warning |
| or let you reload the page, and it will try again. As that's done, that's how |
| we therefore were able to leverage processes to give us that isolation, but |
| also give us that robustness, give us that capability. And that led to a lot of |
| complexity, but also a lot of really amazing sophistication around the |
| compositing engine. Chrome CC library was born subsequently, and all these |
| things that have led to the modern way that we render the web on Chrome now. |
| Skia learned how to render to OpenGL, et cetera, and the GPU process. |
| |
| 21:35 DARIN: Next one came along was the network process, which was really born |
| out of the idea of, gee, wouldn't it be nice to isolate the networking code |
| into its own process that could be more tightly sandboxed? Because the |
| networking stack tends to be a surface area that's accessible by attackers. |
| Just like the V8 and JavaScript engine is parsing lots of stuff and very |
| exposed to attack surface from would-be attackers, the network stack, same |
| thing. You've got HTTP parsing and various other kinds of processing happening |
| very close to content that attackers can control. And so this project, quite |
| rather elaborate project to move the networking stack out of the browser |
| process out of that broker process, but to, instead, its own process and have |
| all the pipes go various IPC channels connecting to there, instead, was born. |
| And I think this was more born in the era of Mojo IPC, where we had a more |
| flexible IPC system that could help support that kind of transition, but still |
| tons of work and quite a radical change to the flow of data and the way the |
| system works. Previously, just to give a little aside, when a renderer is |
| making a network request, the browser process acting as a broker needs to |
| audit, is it OK for that guy to be requesting this thing? Think about all the |
| kinds of rules that might be there, CSP, other kinds of things, and the |
| security origin privileges associated with it and what we want to allow a |
| renderer to actually access. Simple stuff like we support WebUI like Chrome |
| colon pages in the context of, they load in a renderer process, that renderer |
| process should be allowed to access other things from Chrome colon, right? But |
| a web page shouldn't be able to. We don't want the arbitrary web pages to be |
| poking around and seeing what's available in the Chrome colon URL. So that's |
| like a simple example of where we honor that isolation. And so the browser |
| process, having the network stack in the original incantation of Chrome makes |
| no sense. It can apply these rules right there. Safe browsing was integrated |
| there. Lots of different kinds of network filtering could be done there. Moving |
| that to another process was a big change because now browser is the one that |
| has the smarts to do auditing, but the data and all the requests are going to |
| this other process. So making that work meant a lot more plumbing. And I think |
| complexities ensued. But it's awesome to see it happen. |
| |
| 24:20 DARIN: Anyways, I mentioned Native Client. So that was a precursor to |
| Wasm that was a big investment by the Chrome team to find a way to bring native |
| code to the web in a safe, secure manner. The initial take on it was, if you're |
| running native code that came from the web on a system, that's scary. It could |
| do like anything, right? Well, no, let's restrict the process capabilities, but |
| even with a restricted set of capabilities, you can't necessarily restrict |
| everything on Windows or Mac or Linux. There's always some limitation to the |
| sandbox capabilities. And in many ways, the sandboxes that we implemented are |
| kind of just an extra level of defense. If you think about it, the JavaScript |
| Engine is already a sandbox, right? It already limits the capabilities. The web |
| rendering engine, all the different kinds of security checks throughout the |
| code are various forms of sandboxing. And then finally, the process in the way |
| we restrict its capabilities is that next last defense. Well, running native |
| code with only that last defense in place is not enough. So Native Client was |
| designed to be not only to be native code that could be highly auditable, so |
| that you could make sure that it's not allowed to jump to an address that it |
| doesn't have code for, that it's not allowed to do things outside the set of |
| things that it's allowed to do. So it had a lot of complexity as well in terms |
| of how the process has to be set up in terms of the memory layout and various |
| other details, which maybe I'm happy to not remember. And - but it meant it |
| needed its own process type. Even though it integrated kind of like a plugin, |
| it couldn't just be a plugin. It needed its own process type. And there had to |
| be 64-bit variants and 32-bit variants, depending on the actual OS, actual |
| underlying hardware that you were running on Arm versus Intel, all these |
| differences. So yeah, we ended up with leveraging this process model |
| extensively to enable these kinds of things. |
| |
| 26:32 DARIN: I think I mentioned the utility process. In Chrome, the utility |
| process is this thing you reach for when you want to do something that's |
| potentially - like maybe you're dealing with some untrusted input, like you |
| want to decode an image, or you want to run something in a process, and you |
| just want to make sure that if it's going to do anything, it just dies over |
| there and doesn't take down the whole browser process. I think some extension |
| install manifest parsing, maybe various other kinds of things like that, would |
| happen in a utility process as like a safety measure. Generally speaking, |
| parsing input from the web or even the Web Store or things like that, doing |
| that parsing in the browser process is a scary thing because you're taking |
| input from a third party. And if you're parsing it there, you might have a bug |
| in your parser, and that could lead to the most trusted process having been |
| compromised. |
| |
| 27:29 SHARON: Yeah, that falls into the whole Rule of Two thing, right, of |
| untrusted data. We have a [INAUDIBLE] process. It's in C++. The thing that we |
| decided to change is where it gets parsed, so. |
| |
| 27:44 DARIN: That's right. |
| |
| 27:44 SHARON: That makes sense. |
| |
| 27:44 DARIN: Yeah, so the sandbox processes get used as this primitive to give |
| us that extra safety measure. |
| |
| 27:57 SHARON: So the other process type I can think of that wasn't just covered |
| there was extensions. Is there anything to say there? |
| |
| 28:02 DARIN: Sure, of course. |
| |
| 28:02 SHARON: Of course. |
| |
| 28:02 DARIN: In some ways, an extension process will show up that way in |
| Chrome's Task Manager, but I believe it's usually just powered by a renderer, |
| an ordinary renderer, because so extensions have background pages or background |
| event in, I guess, the Manifest V2, it was background pages. Manifest V3, it's |
| now just event pages or service worker type construct. And those need a process |
| to run in. So the extensions get to inject some code that runs in the renderer |
| of the web page, usually in an isolated world, so it can see the same DOM. If |
| you've given the permission for the extension to read website data or to |
| manipulate website data, it can do that by injecting a content script that will |
| run in the same process as the web page that it's reading or modifying. But it |
| will run in an isolated JavaScript context so that it's not seeing the same |
| JavaScript variables and such. But it can still see the DOM. And that's meant |
| to give a lot of capability, but also have a little bit of protection because |
| it's so easy to accidentally interfere with the same JavaScript variables and |
| things like this. OK, so extensions have that piece that injects a content |
| script, but they also have a - usually, they can have this event service worker |
| or background page that is their central place, process place for code to run. |
| And so we do run that in a renderer process. And so for example, if the |
| extension that's injected into a page wants to get some capabilities, it would |
| talk to its service worker, who would then have the capability to ask for |
| certain extension APIs to maybe understand all the tabs that are in your |
| system, depending on what permissions it was granted. And then finally, with |
| extensions, you also have the extension button and a dropdown that can occur |
| there, which a web page can be drawn there by the extension. And that's going |
| to be hosted in a renderer process, too. But that would be a web page that |
| lives at a Chrome extension colon URL. And so you have these different pieces |
| of the extension model where code from the extension can be running, and it, |
| via some messaging channel, can talk to the other parts of itself that run in |
| potentially likely different processes. |
| |
| 30:37 SHARON: You mentioned service workers there, and those are kind of |
| related to all this, too. So can you tell us a bit more about those? |
| |
| 30:43 DARIN: Yes, so - well, OK, so backing up, in the context of extension, if |
| we talk about background page first, the original idea with extensions was, OK, |
| I'm injecting stuff into pages so I can modify things, but I also need like my |
| home base. I need my context where - I need a place where my persistent script |
| is running or where I can manage my databases, and I have just one place for |
| that. And it's also a place where I can get elevated permissions to access |
| other Chrome extension APIs. So that idea of a background page that the |
| extension can create that's ever present so it's like a web page, but it's |
| hidden, it's in the background, and content scripts that are injected into web |
| pages can talk to it. So they can say, oh, I'm on this page. Give me some rules |
| that I should apply to it or something, depending on the nature of that |
| extension. OK, so but background pages are, unfortunately, persistent. And they |
| live for the whole life of the browser. And they use up memory. They use up |
| resources, even if nothing else about the extension needs doing. Even if the |
| extension is not loaded into any web pages, that background page is sitting |
| there. And so this was [INAUDIBLE] quickly realized, this is not great. This is |
| a waste of resources for the system. We should have some policy for how we |
| should close that background page down and only need to create it when |
| necessary. In the context of, I think, Chrome apps, which is a thing that's no |
| longer a thing, we created this concept called event pages, which allowed for |
| these background pages to be a little more transient, that come into being only |
| as needed, which is a much more efficient approach. |
| |
| 32:28 DARIN: However, when it came time to bring that to extensions, at the |
| same time, Service Worker had been created, which was a tool for web pages to |
| be able to do background event processing. So the decision was to adopt that |
| standards-based approach to how to do background processing. And so Service |
| Worker is the construct that Manifest V3 allows extensions to use for that sort |
| of background processing. Big difference between service workers are that they |
| are not web pages. They're just JavaScript. But they can listen to different |
| kinds of events. So just like a web worker, shared worker, service worker, they |
| are without UI. They are without any HTML. They just have the ability to - but |
| they have some functions that are given to them on the global scope that lets |
| them talk to the outside world, to talk to the web page that created them, or |
| in the case of Service Worker, they actually have events they can receive to |
| handle network requests on behalf of the page. That's one of the main uses for |
| them in the context of the web. A web page would have a Service Worker register |
| it with the browser to say, hey, please contact my service worker if you are |
| making a request for my origin. And that gives the Service Worker the |
| opportunity to specify what content should be used to satisfy a URL. It could |
| load that content out of a cache, and the Service Worker API includes APIs for |
| managing caches and things like this. So all of that system that was built to |
| kind of enable web pages to operate more robustly in the context of poor |
| network connectivity or to get performance improvements for applications that |
| are more single page applications that have a basic fixed shell that should |
| load out of cache and then they make network requests to the server to get the |
| data that populates some application UI, that model Service Worker was really |
| designed for. But it seemed a very good fit for extensions. And it gets us out |
| of the world of having these persistent extension background pages. So Manifest |
| V3 says, if you want your content script to have access to privileged things, |
| you go through a system, a Service Worker. And the Service Worker will get |
| spawned in a renderer process. What renderer process? You don't know. It's up |
| to the system. Chrome will make a decision there based on all of its usual |
| rules around what other origins are in that process, thinking from a security |
| isolation perspective, and so on, and so forth. |
| |
| 35:22 SHARON: Cool. A lot of these process types have been added over time as |
| the need for them arises. Like, oh, we want to put network stuff in a separate |
| process. So apart from adding more process types, what have been other big |
| changes to the multi-process architecture and processes in Chrome in the many |
| years since launch? |
| |
| 35:44 DARIN: The biggest one by far is the per site isolation, the site |
| isolation work that was done. |
| |
| 35:51 SHARON: We'll talk about that more next. |
| |
| 35:56 DARIN: Yeah, so, I mean, well, I'll just say, so Charlie Reis was an |
| intern on Chrome team back in the day during the pre-release period of Chrome. |
| And I remember the conversations where we were like, gee, wouldn't it be nice |
| if instead of isolating based on per tab, it was isolating per origin? And I |
| think he was doing research on that topic, too. And he had all these ideas for |
| this kind of a thing. And so it was really kind of very early on that we were |
| having these conversations. But even very early on, it was like, this is going |
| to be a big change, you know? No longer is it the idea that it's a big change |
| to the rendering engine itself, like how frames could be served by different |
| processes. So in order to isolate based on origin, you have to say a frame |
| where an ad might live would actually have to be served by the process for that |
| origin. And so now no longer is the whole frame tree just in one process. |
| That's a big change. But built on top of the infrastructure we had, it was |
| possible to imagine it, and it was quite a journey to get there. So that was |
| probably the biggest change to the architecture. But like I mentioned before, |
| actually, other big changes were definitely the introduction of the GPU |
| process, definitely the introduction of Mojo IPC. Before Mojo IPC, the way |
| things worked was, basically, messaging was much simpler, in some ways, easier |
| to understand, but also much more the case that there were these files that |
| really needed to know about everything in their world, like the render process |
| host and the render process, the render view host and the render view, the |
| render frame. The render frame host didn't exist then, but they came about |
| because of site isolation, really. But the render view, render view host became |
| this thing that represented the web page, and render view host in the browser, |
| render view in the render. And for any feature that required brokering out to |
| the browser to get access to something, essentially, the render view, the |
| render view host had to be participants in that because they had to be kind of |
| routers for that traffic. That's not very scalable. You start adding lots of |
| engineers, building lots of different features that need lots of different |
| capabilities. And these files start growing hairs and knowing about too many |
| things. And it becomes really hard to manage. |
| |
| 38:38 DARIN: On top of that, you start to have things where you say, gee, I |
| really wish this system could be live in a different process. I mentioned the |
| networking process. All these events were coming through these different kinds |
| of crossroads of hell files. That was how I liked to call them. And in order to |
| take a subset of that and move it to a different process, now you have to redo |
| all that plumbing. And so the amount of layers of repeating yourself for |
| plumbing IPCs felt very out of control for - maybe how much work you had to do |
| to unlock a certain feature just seemed out of control. And so Mojo really was |
| inspired by how to eliminate a lot of that, to have a system that's more |
| endpoint to endpoint-based and all the flow of data would no longer be |
| dependent on all of these kinds of routing classes that handled all this |
| routing. And instead, you could just say, I have an endpoint. I have an |
| endpoint over here. This one's privileged. This one's not. And if I want this |
| one to live over here, I can do that. I can just move it around freely. And all |
| the routing is taken care of for me. And so that was a big change. And there's |
| many artifacts in the code base that sort of reveal the old system, right? In |
| many ways in which the product is built still resembles that old system. The |
| idea that if you look at a render view, render view host, there's an ID, a |
| routing ID associated with that. The concept of routing IDs are not needed in |
| Mojo anymore because the pipe itself, the Mojo pipe is like an identifier, in |
| some sense. Of course, so much of our system is built up around the idea that |
| tabs have these render view IDs, and frames have render frame IDs, and |
| processes have process IDs. And so many systems deal with those integers that |
| it's been unthinkable to not have those anymore. But in some sense, they aren't |
| really needed. If we were to build things from scratch from anew with the Mojo |
| system, you wouldn't need it. |
| |
| 40:50 SHARON: Do you think if you were to start redesign the whole |
| multi-process thing now, given how not just the internet is used, but also the |
| devices that are out there, I think you would probably want to have multiple |
| processes for things. But do you think there would be significant changes to |
| how the system overall is designed or put together if one were to start now? |
| |
| 41:16 DARIN: Well, yeah, I mean, it's always a question of where you're |
| starting from and what the constraints are that you're dealing with. We were |
| dealing with taking WebKit, which we didn't really have a lot of ownership of. |
| And it was open source, but we also had limited bandwidth to go and fork it and |
| manage that fork. And so to kind of try to create multi-process in the context |
| of this big significant piece that we really can't change or do much about |
| definitely limited us. So we had early ambitions and ideas. Like I said with |
| Charlie about site isolation, it wasn't going to be then that we could realize |
| it. It needed to be in a place where we had ownership of Blink. And not just |
| ownership, I mean capability to go and change it and to own the consequences of |
| changing it, to be able to manage that. We needed that, and we needed a lot of |
| other pieces. So if I'm starting over, I also have to - it's sort of like, |
| well, what am I starting from, right? But certainly, I feel like a lot of |
| lessons along the way inspired Mojo and the design there. And I feel like |
| that's a system that that sort of system would allow for an architecture that I |
| think would be better in many ways. And I'm very biased because that's |
| something I've worked on, and it was inspired by things I saw that weren't |
| great about the way that we built Chrome originally, although, in many ways, |
| the original setup with Chrome was born of pragmatism and minimalist in many |
| ways, trying to achieve - Chrome was very focused on being a product first, not |
| a browser construction kit. And so the idea that it needed to morph into a lot |
| of different things wasn't there in the beginning. In the beginning, it was, |
| you're just building a browser for Windows XP Service Pack 2. That's it, |
| nothing else. OK, now Vista. You got to worry about Vista, too, sorry. But just |
| that's it. And then later on, you add Mac. You add Android. You had Chrome OS, |
| iOS, Chromecast, et cetera, et cetera. And suddenly your world is very |
| complicated, and the needs of this system is way more. And the value of |
| malleability becomes higher. Look at the investment in views, et cetera, to |
| allow cross-platform UI, and then Mojo to allow a much more flexible system |
| under the hood. So it depends on your constraints in a lot of ways. |
| |
| 43:43 SHARON: Yeah, that makes sense. Something you said about even now in the |
| code base, you can see remnants or suggestions of how obsessed maybe of how |
| things used to be. So one of the things that makes me think of is about the IO |
| and UI threads because I feel like people used to talk about those more. And |
| now that's maybe changing a bit. So how come these are the only times we hear |
| the term "thread," really, in all of this? And what are the IO and UI threads |
| that can you just tell us a bit about? |
| |
| 44:20 DARIN: Oh, yeah, threading is a super fun topic. Now we have all these |
| task runner concepts and systems for giving you a task runner that's on an |
| isolated thread or whatever. And systems like Mojo allow you to not really have |
| to do a lot of plumbing to compensate for your choice of thread where you want |
| something to run. You can just indicate where it should go, and that happens. |
| But OK, originally, the design of the system was there was a UI thread, and |
| that's where all the UI lives. So the HWNDs, the Window handles and all the |
| Win32 stuff would go there. Input painting come in there. Then there was - so |
| early on, I like to tell this story because one of the very first versions of |
| Chrome, we had just that UI thread sending data to a renderer processes. And |
| the renderers would have their main thread where they ran JavaScript and |
| everything. So there was just these two threads in two different processes. |
| That was kind of it. In the browser process, there might have been the system |
| was probably doing a lot of other stuff with its networking stack and DNS |
| threads and such. But we weren't doing any. That wasn't us. That was probably |
| libraries we were using. So we had these two threads in two different processes |
| and IPC channel. And so you send the input down to the renderer. The renderer |
| sends you a bitmap. OK, Google Maps. Imagine Google Maps. And imagine you're on |
| a single core, non hyper-threaded laptop. And you take your mouse, and you |
| click on that map, and you start dragging it around. And you expect to see the |
| image tiles moving around, right? And but for some reason, in Chrome, on that |
| device, [SNAP] nothing happens. You just move your mouse around, and the image |
| is stuck there. You're like, what's going on? It works fine on this other |
| laptop. Why not on this laptop? Turns out that on that device, in that setup, |
| the input stream was coming in. And basically, we were sending all this input, |
| and the input events were taking priority in the Windows Event pump over any |
| painting and/or reading from our IPC channels. And so, as a result, we were |
| just sending input events to the renderer. It was doing work, generating new |
| images. Those images were coming to the browser and backed up in some pipe and |
| not really being serviced, not really making their way. And so we kind of came |
| to the realization of several things. One is, we need to throttle that input |
| going to the renderer, but we also probably need to have some highly responsive |
| IO threads that could be dedicated to servicing the pipes, the channels, the |
| IPC channels, both in the browser and the renderer, actually. And so what was |
| born from that was the IO thread. And the IO thread was meant to be highly |
| responsive thread for processing asynchronous IO. That's really what its name |
| should be - highly responsive, non-blocking IO thread - because the name IO |
| thread subsequently confused lots of people who wanted to do blocking IO on |
| that thread, like read a file or something. And we had to put in some |
| restrictions in the code to always let you know not to - that this function is |
| going to - there's certain runtime assertions if you try to use certain |
| blocking IO functions in base on the wrong threads. And alongside that, we |
| invented something called the file thread. Said, this is the thread where you |
| read files. This is the thread where you write files because we don't want you |
| doing that on the UI thread because the UI thread needs to be responsive to |
| user input. So don't do blocking file IO on the UI thread. Don't do it on the |
| IO thread either. Do it on the file thread. So - |
| |
| 48:14 SHARON: That means they're all running in the browser process. |
| |
| 48:20 DARIN: In the browser process. The renderer got its own IO thread, too. |
| So the renderer would have its main WebKit thread and its IO thread. So it was |
| sort of a symmetric system. You had IPC channel, which was wrapped with a class |
| called `ipc_channel_proxy`. These things still exist in the code base. And |
| ChannelProxy was a way to use an IPC channel from a different thread. But the |
| IPC channel would be bound to the IO thread. All of those things I just |
| mentioned still exist, and Mojo was built on top of those channels. But the IPC |
| channel provides that underlying pipe. So it's kind of IPC channel is |
| one-to-one with an OS pipe. Mojo has this concept of pipes which are more like |
| virtual pipes, and they're multiplexed over OS pipe, over an OS pipe. |
| |
| 49:08 SHARON: OK. Yeah, because I think, yeah, now you hear non-blocking IO, |
| but I feel like maybe it's just what part of the code base you work in. But |
| running things, making sure things run on the right thread seems to be less of |
| a problem than it used to be. |
| |
| 49:27 DARIN: Yes. I think there's a lot of reasons for that, a lot of maturity |
| in the system. But also, I think some of the primitives are set up nicely so |
| that you can more easily have things running. In some ways, we used to have |
| this concept of, yeah, we very much had this. Still, in some ways, still have |
| this, but the idea that there is a UI thread, that there's an IO thread, and |
| that there is a file thread, and you pick which thread you're going to use. |
| Now, there's a whole pool of blocking IO threads. And you don't specifically |
| say, I want the file thread. You say, I have blocking IO I want to do, or give |
| me a - I want to put it on a thread pool. The IO thread used to be like where - |
| it may be still the case that some systems would just live there only because |
| maybe for latency reasons - like, cookies is a good example. We knew that we |
| wanted to be able to respond quickly to the renderer if it was querying a |
| cookie database. So we want to be able to service that directly on the IO |
| thread. And so there'd be a collection of these things that were maybe somewhat |
| sensitive, and but we wanted to have them live and be on the IO thread. And so |
| that idea of some things live on the IO thread was born. But I think those |
| things are few. And you really have to highly justify why you should be on that |
| thread. And so most things don't need to be. Just be on the UI thread. It's OK. |
| Or structure your work so that the part that is expensive and blocking goes to |
| a blocking queue. |
| |
| 51:00 SHARON: So partly for these threads, sometimes you see checks. Like, |
| check that this is running on a certain thread. But in general, is there a good |
| way to find out what process a certain block of code runs on? Because some |
| things we know - if you go to a third party Blink, whatever, you kind of know |
| that that's going to run in a render process, but just looking at the code, |
| like looking in code search, can you know where something is going to - |
| |
| 51:25 DARIN: [INAUDIBLE] very early on to try to deal with this. So like if you |
| go to the content directory, it's a good one to look at. You'll see a browser |
| directory, subdirectory, a renderer subdirectory, and a common directory. And |
| there's some other ones that have these familiar names. We use that structure |
| all throughout the code base for different components. So if you go components, |
| components foo, you'd see browser, renderer, common, maybe a subset of those, |
| depending on. And so the idea is, if it's code that should only run in the |
| renderer, it lives in the render directory. If it's code that should only run |
| in the browser, it lives in the browser directory. If it's code that could run |
| in either, it lives in the common directory. So you'll see mojom definitions in |
| common directories because mojom is where you define the Mojo interface that's |
| going to be used in both processes. |
| |
| 52:12 SHARON: Oh. |
| |
| 52:12 DARIN: Yeah, we also have this code separation was also kind of born out |
| of this idea at one point in time that we might generate a totally different |
| binary for browser and renderer. And we used to have browsR. I'm calling it |
| that way because it didn't have an E at the end, so browsR and capital R, and |
| then rendR or something like this. And these were the two processes, the two |
| executables. And they could just compile whatever code they needed for their |
| purpose. Like WebKit would be in the renderer, and browser would have not |
| WebKit. It would have other things. And so these separate directories also |
| helped because it was like, that's the code that's going to go into that |
| process literally. And fast forward when Sandbox came along, the team was like, |
| nope, it's got to be the same executable for both browser and renderer and |
| should probably be called chrome.exe instead. And then that idea kind of that |
| they were separate executables and separate code kind of went away. And |
| instead, all the code for Chrome went into just this big DLL on Windows. And |
| the amount of shared code between the EXE and the DLL is very small, maybe a |
| little bit from base and such. But yeah, this idea of tagging the directory |
| structure in such a way that makes it obvious of like what process this code |
| belongs in, I think it was a big help, and it was a good choice. And it gives |
| people a little clarity of where they are and what they can use. |
| |
| 53:49 SHARON: What about for non-browser renderer processes? What about GPU |
| network? How do you know that this is running on the network process versus |
| this is how this part of this section of the code is interacting maybe with the |
| network process? |
| |
| 54:05 DARIN: Sometimes it can be a little bit of good luck. And sometimes it |
| might not be as obvious. I don't think this sort of - this structure that I |
| described was used for plugins, so there's a plugins directory, which may still |
| be around in some fashion or might be mostly gone. I don't know if when the |
| network process transition occurred, if this annotation was really maintained. |
| I actually don't think it was because I don't remember seeing network |
| directories. But I could be wrong. There might be some of them. I'm not as |
| familiar with the code for the networking process. But I think this convention |
| has helped us a lot and would be valuable to use in more places. For GPU, |
| there's a lot of symmetric code, probably code that runs in all processes, but |
| still this convention probably would make sense. But yeah, I think that for |
| some of those things, when you get like into the network world or you get into |
| the GPU world, you're also kind of in a more focused world, a smaller world. |
| And there's probably many other things you have to learn about that domain. |
| |
| 55:16 SHARON: Yeah, the GPU stuff seems very, very difficult. And I certainly |
| don't know how that works. OK, so - |
| |
| 55:23 DARIN: [INAUDIBLE] on there. |
| |
| 55:23 SHARON: Yes, so when it comes to process limits and performance and all |
| that kind of thing, so we have process limits, but you can go over them. And |
| can you tell us a bit about process limits, how they work, what happens when |
| you reach the limit? |
| |
| 55:39 DARIN: Hmm, yeah. So process limits, they exist to just have a reasonable |
| number of processes allocated for some definition of reasonable. At least early |
| on, that definition was based on how much RAM you had on your system. And as |
| computers got more and more RAM, that definition needed to be adjusted. We |
| assumed some overhead for individual processes. It's probably wise to put some |
| limits on how many we create. The allocation of those processes, it's best to - |
| kind of viewed as best to distribute the tabs across them as best as we can and |
| the origins across them now and the side isolation world to give more isolation |
| between different origins, to give more isolation between the different apps. |
| But at some level, you run out, and you need to now allocate across the ones |
| that are already in use. There's some hard rules around privileged content, |
| like Chrome colon URLs. They should not mix with ordinary web pages. But if |
| push comes to shove, we'll put a whole bunch of different origins content |
| together into the same process, just ordinary web pages, not trusted content. |
| |
| 56:52 SHARON: What happens if you just open a ton of tabs with a whole bunch of |
| different pages open, and you're basically stress testing what Chrome can do? |
| What happens in that case? |
| |
| 57:08 DARIN: It creates a lot of processes. It uses a lot of system resources. |
| It uses a lot of RAM. I think that this has been, I'd say, a battle for Chrome |
| across a lot of its lifetime and more recently, is how to manage these extreme |
| cases. And increasingly, these extreme cases are not actually odd or unusual. |
| They'll do a lot of browsing. People click on a lot of links. People create a |
| lot of tabs. People don't really close their browsers. They just leave it |
| running. And they come back the next day, and they continue where they left |
| off. And they open more tabs, and they do more surfing. And they just collect |
| and collect and collect tabs. And maybe they create more windows because maybe |
| they have some task that they're researching, and then they get interrupted and |
| they come back to it later. But they start to accumulate these windows full of |
| things that maybe they mean to come back to. And so that problem of just having |
| lots and lots of stuff and lots and lots of processes, well, Chrome under the |
| hood is like, I'll do my best. You wanted me to do all this stuff. I'm going to |
| do it. Let's see what I can do. And on a system like Windows or Mac where |
| there's a lot of RAM maybe, Chrome's thinking, OK, you wanted me to use the |
| RAM. I'm going to use the RAM. You wanted all those tabs. And then even on |
| those systems where maybe you're running out of RAM, but there's virtual |
| memory, there's disk space, all right, let's use it. Let's go. And so I think |
| it's really quite a challenge, actually. |
| |
| 58:44 DARIN: The original idea of Chrome was, yeah, make it possible for web |
| pages to take advantage of the resources of your computer. Let it allow web |
| pages to be more capable because of it, and not be - the old world prior to |
| Chrome was single-threaded browser, all web pages on the same thread. Like, you |
| could have a dual core machine, and it wouldn't matter. It wouldn't make your |
| browser any faster. But now with Chrome, no problem. You got dual core. You got |
| eight cores, whatever you got. We can have all of those things saturated with |
| work and allow you to multitask on the web and do lots of amazing things. But I |
| think it's still a resource management challenge for the browser because on one |
| hand, you want to give that capability, but on the other hand, you also don't |
| want to - how much power should you be using? What if the laptop's not plugged |
| into the wall? What if it's just running on battery? What is the right resource |
| utilization for Chrome? I don't think that's a solved problem at all. There's |
| various systems in place to throttle the resource utilization of background |
| tabs. Timers, for a long time now, have been throttled, but throttling other |
| things. I know there was a lot of research done into freezing tabs, so |
| literally suspending them and not letting them do any work. But with that comes |
| challenges of what do you do with all the IPCs that are inbound to those |
| processes? They're backing up on pipes, and that's not great. If you unfreeze |
| them, now there's a blast of IPCs coming in that they suddenly have to service. |
| That doesn't seem great. Do you drop those IPCs on the floor? Probably not. |
| Now, the process would be in some weird state, and you might as well have to |
| just kill it, which, of course, is the case on dev systems like Chrome OS and |
| Android. They do have to just kill the processes because of the limits of those |
| devices. So, yeah, I've been a proponent of just being aggressive about killing |
| processes on desktop in general. I think there's some balance there that's |
| right. It's probably not right to keep all the tabs open, all the processes |
| open. We should be, I think, judicious about what we keep open, keeping the |
| workload reasonable, instead of making it like a, oh, yeah, I will rise to the |
| challenge of dealing with thousands of tabs or thousands of web pages across |
| 100 processes, even if - maybe it's somehow possible through heroic effort to |
| make Chrome capable of doing such a thing in an efficient manner. But does it |
| mean we should? Who needs 1,000 tabs all running around doing work at once you |
| know? You don't. You really don't. Nobody does. |
| |
| 61:32 SHARON: So this is kind of the basis of the goal for Arc, right, which |
| is I think it closes your tabs overnight or something. And Arc is what you work |
| on now and is a Chromium-based browser. So for embedders of Chromium, let's say |
| the browser kind, how much control do you have over how processes are used, |
| allocated, if you embed content? Like, are you able to just say, oh, I don't |
| want a network process. I will just put this all in the browser process. Can |
| you do that? |
| |
| 62:07 DARIN: Hm. You can do anything you want. It's just code. No, but as a |
| browser embedder, as a Chromium embedder, you're shipping Chromium. So Arc |
| browser ships a copy of Chromium. And Arc browser includes changes to Chromium |
| as needed to make it work. Of course, that's possible. Of course, you could |
| change a lot of stuff and make a big headache to manage it all, right? So |
| there's some natural limits. You don't want to change too many things, or else |
| you won't be able to really manage it going forward. You want to take updates |
| from the mainline, incorporate improvements, but you also want to preserve some |
| differences that you've made. Well, how do you do that? And so change |
| management is a challenge. So there's a natural limit to how much you want to |
| alter the base functionality. Instead, it's - anyways, the product like Arc is |
| not so much differentiating on the basis of Chromium code or content layer. |
| It's not really its purpose or goal. Its purpose is to differentiate at the UI |
| layer and with things like what you mentioned and other things as well. Yeah, |
| and so, of course, if one were to go down the path of could we optimize process |
| model better, that would be in the realm of things that would be great to |
| contribute to Chromium, so that it could be part of the mainline and therefore |
| not be something that you have to maintain yourself. That's how I would |
| approach it as a Chromium embedder. |
| |
| 63:47 SHARON: OK, that makes sense. Yeah, if it's in Chromium, you don't have |
| to worry about the updates, and you just get - |
| |
| 63:53 DARIN: Turns out there's an army of engineers who would make sure it's |
| never broken. You just gotta write some tests. |
| |
| 63:59 SHARON: Oh, wow. |
| |
| 63:59 DARIN: [INAUDIBLE] those tests. |
| |
| 64:05 SHARON: So with non-browser embedders of Chromium, like, say, Electron, I |
| don't know how familiar you are with that, but they presumably would have |
| different needs out of how Chromium works, basically. I don't know if you know |
| what they're doing with any processes. |
| |
| 64:25 DARIN: I mean, I've used VS Code. That's a famous example of a Chromium |
| embedder that you might not realize is using Chromium or built on top of it, |
| that one might not realize that. But if you open up Task Manager and you look |
| at VS Code, you'll see all the glorious processes under there. And so have they |
| or Electron or any of these, have they altered things there? Maybe. I mean, |
| there's some configuration one might do. If you're building an application |
| that's very single purpose, like VS Code or Slack or - what are some other good |
| examples, there's quite a few that are built on top of Chromium - they're more |
| single purpose towards a single app, right? Of course, VS Code is pretty |
| sprawling with all the things you can do in it, but at the same time, it could |
| be the case that they don't have the same security concerns. They don't have |
| the same idea of hosting content from so many different sources. So maybe they |
| would tune the process model a little differently. Maybe they would decide, I |
| don't really need as many processes because I'm managing things in a different |
| way. It's not a browser. |
| |
| 65:34 SHARON: Yeah, you're not handling all of the untrusted JavaScript of the |
| web that you have to be - |
| |
| 65:42 DARIN: Right, I'm not so worried about this part of my application dying |
| and then wanting to keep the rest of it still running or something because that |
| would still be considered a bug because part of my app died. And so some of the |
| reasons for multi-process architecture might be a little different. |
| |
| 66:01 SHARON: Right. And more just for fun, having worked on now an embedder of |
| Chromium, how has that experience been in terms of decisions that were made |
| when you were putting together the multi-process architecture? Are there things |
| where you were like, oh, no, past me, if you'd done this differently, this |
| would be easier now. |
| |
| 66:20 DARIN: I would say I'm very thankful for Mojo IPC, made it very easy to - |
| one thing that I've found is that it's possible to do a lot of amazing things |
| on top of Chromium without actually modifying Chromium. And the Content API and |
| Mojo IPC makes a lot of that really possible. So it's a very flexible system. |
| There's a lot of really great hooks that let you interact with the system all |
| the way from extending the renderer to extending the browser. And to be able to |
| build stuff and layer it on top of a stable system is amazing. When I was |
| working on building an Android browser, I built a tracking prevention ad |
| blocking system for Android and was able to do it without modifying Chromium. I |
| thought that was amazing. |
| |
| 67:19 SHARON: How are you using Mojo? Because Mojo is typically going between |
| the processes. So if you're not really changing how the processes work, what do |
| you use Mojo for? |
| |
| 67:26 DARIN: Oh, well, in that case, it was used to communicate a rule set down |
| to the renderer. And then at the renderer level, I would inject a stylesheet to |
| do content blocking or to apply a network filtering at the link layer. So there |
| are a combination of Blink Public APIs and Content Public APIs. There are |
| actually enough hooks to be able to filter network requests and insert |
| stylesheets that would apply display none to a set of DOM elements. So but to |
| do that efficiently, it was necessary to bundle up those rules into a blob of |
| memory that you would just send down to the renderer process, to all render |
| process, so it'd have it available to them so they could just directly inspect |
| like a big hash map of rules. And so being able to - like I said before, when |
| the IPC system is just like - when it's decoupled like that with Mojo, it makes |
| it possible to kind of graft on these systems that they interact with APIs over |
| here, and that endpoint talks to some endpoint over here in the browser |
| process, which can have, like I said, like a rules data that it might want to |
| send over and that kind of thing. And so being able to build those kinds of |
| systems, and I think if you look at just how a lot of features in Chrome are |
| built, they're built very similarly, too. They build on top of the Content API |
| that provides the various hooks. They build on top of Blink API. Sometimes a |
| feature needs to live in the renderer and the browser process. Like autofill is |
| always the classic example of this early on in Chrome or password manager. |
| These are systems that need to crawl the DOM. They need to poke at the DOM. |
| They need to understand what's there. They need to be able to insert content or |
| put overlays in, or they need to be able to talk to the browser where the |
| actual database is, all that kind of stuff, and looking at different load |
| events and various things to know in the lifecycle of the page. So, yeah, I'd |
| say I'm thankful for a lot of these design choices along the way because I |
| think it's led to Chromium being so useful to so many people in so many |
| different ways. Obviously, it empowered building a really great browser and a |
| really great product, but it also has empowered a lot of follow-on innovation. |
| And I think that's pretty cool. |
| |
| 69:53 SHARON: It is pretty cool. So Chrome was released in 2008. It is |
| now 2023. So as math tells, it's been 15 years. We like numbers that end in 5 |
| and 0. So - I don't know - it's very cool. I remember when Chrome came out. And |
| I don't know. Do you have any - |
| |
| 70:08 DARIN: Yeah, for me, it's more like 17 years because we started in 2006. |
| |
| 70:14 SHARON: Right. So do you have any general reflections on all the stuff |
| that's changed in that time? |
| |
| 70:22 DARIN: It's wild. I have a higher density of memories from the early |
| days, too. It's amazing. I guess that's how memories work when everything's new |
| and changing so much. But yeah, no, I'm very thankful for the journey and very |
| thankful to have been part of it. And it was a lot of fun to work on. I mean, |
| prior to Chrome, when I was working on Firefox, I did a little exploration on |
| adding like a multi-process thing to Firefox, which I thought - just, I was |
| learning about how to do IPC, and I was learning - but I was doing it for what |
| purpose back then. I think I was just toying around with DCOM. I don't know if |
| anybody knows what COM is, but Microsoft's Component Object Model that was like |
| all the rage back then. And it allowed for like integrating different languages |
| together. WinRT is all built on top of this stuff now. But anyways, Mozilla had |
| its own version of COM called XPCOM. And wouldn't it be cool if you could have |
| a component that - so you could have components back then that were built in |
| JavaScript, and you could talk to them from C++, or they were built in C++ more |
| commonly, and you talked to them from JavaScript. But wouldn't it be cool if |
| one endpoint could be in another process? So that was something I was playing |
| around with in 2004 when I was still working on Firefox. And then when Chrome |
| opportunity came along - maybe that was 2005 - I don't know. But when the |
| Chrome opportunity came along, I was like, all right, let's do it. IPC channel |
| was basically those ideas, but kind of more polished slightly. |
| |
| 72:02 SHARON: OK. Yeah, very cool. I mean, when I first started working on |
| Chrome stuff, someone on my team said, any time you change something in base, |
| that pretty much is going to get run anytime the internet gets run, which I |
| thought was super crazy for just some random software engineer like me to be |
| able to do, right? But - |
| |
| 72:20 DARIN: And now it's even more than that if you think about [INAUDIBLE] |
| code and [INAUDIBLE].. |
| |
| 72:20 SHARON: Yeah, all the stuff. So do you ever just think about it, and |
| you're just like, oh, my god, wow. |
| |
| 72:26 DARIN: Yeah, it's pretty amazing. |
| |
| 72:31 SHARON: So crazy. |
| |
| 72:31 DARIN: It is one of the special things about working on Chromium, is |
| that, yes, you can have such an amazing impact with the work that you do there. |
| |
| 72:38 SHARON: Have there been any cases - these are just now unrelated |
| miscellaneous questions. But in terms of surprising usages of Chromium, be it |
| like maybe the base or the net stack or something, have there been any cases |
| where you were really surprised by like, oh, this is being used here? |
| |
| 72:56 DARIN: Well, for sure, the first time I heard about Electron, I was like, |
| oh, this is not a good idea. House of cards, you know? It just seems like it's |
| such a complicated system to build your app on top of, right? But at the same |
| time, I totally get it and appreciate it, and I understand why people would |
| reach for it. There's so much good sauce there, so much good stuff and so |
| many - there is a lot of really good infrastructure there to build on. Early |
| on, I kind of imagined more that things like Skia and V8 and some of the other |
| libraries would be the thing that people would make lots of extra use out of, |
| right? So I didn't quite imagine people taking the browser's framework like |
| this. And we absolutely didn't build it with that purpose. Pretty much every |
| choice along the way was highly motivated by making Chrome team's life better. |
| Like, Content API was, when we came to the realization we needed it, it was |
| like we desperately need it. Just the complexity of Chrome was getting |
| unwieldy. We needed to cleave part of it and say, that is this part. We needed |
| to somehow draw a line in the sand and say, this is the set of concerns over |
| here. And so the idea that all of this could be used for other purposes is |
| cool, but it was never really in the initial cards. And I came from working on |
| Mozilla, which was, in many ways, browser construction kit first, product |
| second. So Chrome was very much like, let's go the other extreme - product |
| first, maybe a platform later. And to see it be this platform now is pretty |
| cool. But it's pretty far from where we started. |
| |
| 74:50 SHARON: Yeah, kind of - I watched some of the earlier talks you gave |
| about the multi-process architecture and Content, not Chrome, came up a bunch. |
| And this is, things, I guess, like Electron are the result of that, right? |
| Where - |
| |
| 75:01 DARIN: Yeah, it's pretty wild. Yeah, I mean, so Mozilla built this very |
| elaborate system called XUL, or X-U-L, which was a XML language for doing UI. |
| And it's very interesting, intellectually interesting, maybe different than |
| XAML. XAML is way better probably in many ways. But XUL was kind of XHTML |
| minus, minus, with a bunch of stuff added on for like UI things. And then it |
| had this thing called XBL, which is a bindings language that you could do |
| custom bindings. And so anyways, then you build your application in JavaScript |
| and Firefox, Mozilla, it was all built this way. So it was like a web page |
| hosting a web page. The outer web page was like this XML DOM. The product |
| engineers working on that, in order to get some modern Windows sort of thing |
| come through, they had to basically go through the rendering engine team to get |
| them to do something. And so it really greatly limited the ability for product |
| team to actually build product. And there were so many sacred cows around the |
| shape of Gecko and how that structure was, that while this cross-platform |
| toolkit seemed glorious at first, it ended up being handcuffs for product |
| engineering, I think. So, yeah, Chrome started out with Win32 native UI for |
| browser UI. You have all the choices you want to make, browser front-end |
| engineers. You also have to build a lot of code, but no cross-platform |
| toolkits. Views came later. |
| |
| 76:43 SHARON: Right. Well, this was great. Thank you very much. Normally, we do |
| a shout-out section at the end. Do you have anything - normally, it's like a |
| Slack channel or something like the Mojo Slack channel. I think in this case, |
| it's maybe - I don't know if there is a specific thing, but is there anything? |
| |
| 76:57 DARIN: Shout-out to all the team and the engineers making everything |
| great. |
| |
| 77:03 SHARON: All right. |
| |
| 77:03 DARIN: Yeah. |
| |
| 77:03 SHARON: Cool. Awesome. Well, thank you very much for chatting with us. |
| That was super cool, lots of really interesting background and good |
| information. So thank you very much. |
| |
| 77:15 DARIN: Yeah, a pleasure. Thank you so much for having me. |
| |
| 77:21 SHARON: Talk about threads, so IO, UI thread. |
| |
| 77:27 DARIN: Do I get credit for the confusingly named IO thread? |
| |
| 77:27 SHARON: OK, all right, we can cover that. That's cool. Yeah, why is it |
| called IO thread when it doesn't do IO? |