Google Photos Isn’t Getting E2EE Anytime Soon
Google and Apple both like to talk about privacy when it comes to your data. However, when it comes to their respective photos storage solutions, they take two very different approaches that have a major impact on how some of your most precious digital belongings are handled. Google’s methods in enabling many of the features users expect in these solutions have left them in a difficult spot when it comes to security. End-to-end encryption (E2EE) is one of the strongest ways consumers can ensure their data is safe from prying eyes and while Apple now allows you to fully encrypt your photos data end-to-end, Google does not. In fact, there are significant hurtles in between Google and offering true E2EE for your photos.
Apple’s focus on on-device processing of your photos to enable key features is what allows them to end-to-end encrypt your photo data. By nature of end-to-end encryption, data encrypted this way cannot be access by the company storing the data in their servers. The data is encrypted by the device of origin with a dedicated encryption key, send to the server, and stored as is. The server operator does not have a functioning encryption key to decrypt the data. When the user requests the data, it is sent back in the same state in which it was received an the user can decrypt the data with their own key.
Since the server operator doesn’t have a means to decrypt the data, they cannot access it to perform operations, such as scanning. Thus, if you want to end-to-end encrypt your data, you cannot perform those actions in the cloud. They need to be done locally on the device as the device does have access to the encryption keys required to decrypt the data.
In order to reduce the need to access user data on servers, Apple has prioritized on-device machine learning, particularly when it comes to photos. Identification of faces, objects, etc. is done entirely on your devices. The results are saved as metadata for the photos, which are then synced across devices via the cloud. This scanning is done across you iPhone, iPad, and Mac, not just a single device, speeding up the process and distributing resources.
For example, over the course of a day, I may take many photos of people and objects. Facial matching does not occur right away, as doing so would hurt battery life, performance, etc. Rather, on-device processing is done on mobile platforms when your device is plugged in charging and not in use. However, the photos may still sync with iCloud and become accessible on other devices, such as your Mac. These photos may be scanned on-device on your Mac and the created metadata, such as object recognition and faces, can then be synced back across your devices for that data to become accessible on your mobile devices before those devices had a chance to do that scanning themselves. There are many things that could get in the way of this process, however, such as low battery on the iPhone preventing photos from syncing with iCloud. You can force this syncing, however, in the Photos app.
A similar process is done when you edit your photos. Nearly all of the edits you can make on your photos in iCloud are non-destructive, meaning they can be undone and don’t require you to save an extra copy of the photo. This is because all of the edits you make on your photos are recorded as a set of instructions, rather than hard edits to the photo. When you make an edit, rather than the photo in the cloud being swapped out for the edited photo, the edit instructions are instead sent separate to the photo itself and synced across your devices. When you view a photo on you Apple device, the original photo is downloaded separate from those edits and the edits are applied on the fly according to the instructions. This is why, when you go to edit your photos in Apple’s Photos app that already have edits, the appropriate sliders are already moved to match the previous edits. By contrast, each instance of editing on Google Photos is a fresh start. Additionally, Google Photos often requires you to save edits as a copy of the original image, creating duplicates.
In summary, the original photos, metadata, and edits are generated locally on your device and then synced across your devices separately. When you view a photo on another device, the data is assembled and edits are rendered to show you the appropriate image and it’s edits along with tagged faces and identified objects.
That’s all nifty but why do you want your data to be encrypted end-to-end? End-to-end encryption provides a number of benefits compared to other encryption architectures. Today, it’s standard to have data encrypted in-transit and at-rest, meaning your data is encrypted before leaving your device, but decrypted when It arrives at the server where it’s then encrypted again. This means the server provider has the encryption keys to decrypt the data while it’s at-rest in their server. This may not seem like an issue if you trust that company, however there are risks involved regardless.
By law, these companies are required to provide the user data they have if presented with a warrant by law enforcement. These warrants aren’t always targeted toward an individual user, but groups of people, most of which innocent. If the company has the encryption keys to decrypt the data, they may be required to provide the decrypted data. However, if the company does not have the means to decrypt the data, they can only provide encrypted data.
That’s not all. There is always risk of penetration from external parties, like hackers. With data encrypted in-transit and at-rest, the data is encrypted while in storage. However, the keys to decrypt that data are not far from reach. An experienced hacker may be able to acquire both encryption keys and stored content. With end-to-end encryption, even if a hacker gets access to the data stored in the server, they’re only getting encrypted data and cannot get the keys to decrypt it. There are potential ways to brute-force the encryption and get access to the data, but this is significantly more difficult and expensive and less likely.
Finally, the company themselves will have access to the data and will use that access. In the case of photos, as we’ve discussed, this occurs for server-side processing like face or object recognition. The more this occurs, the higher the risk that methods like memory attacks could be used to get access to the data while it’s temporarily decrypted as part of these processes.
The company themselves may also perform scans on your data that you might not want. This is a topic that’s hotly debated, but companies will perform scans of various types to sniff out possible instances of Child Sexual Abuse Materials (CSAM), for example. Depending on your stance on surveillance, you may be either for or against this. Either way, the company needs access to your data in order for CSAM detection to take place in the server. Similar detection methods can take place on-device, but we’ll get to that in a bit.
What changes will Google need to make to Google Photos to enable E2EE? Google is heavily-reliant on server-side processing. If Google wants to enable End-to-end encryption, Many actions will need to be moved on-device so they can be performed without access in the server.
Face and Object Detection is a big one. All face matching and object recognition is done in the server after a photo is uploaded in Google Photos. Since these scans seem to happen shortly after upload, the images must be encrypted upon leaving your phone, decrypted upon getting to the server, re-encrypted for storage, then decrypted to be scanned before being encrypted once again for longer-term storage. Apple’s facial-recognition and object detection occurs on the device while the device is plugged in charging and not in use. Google may need to apply a similar approach with Google Photos, a monumental feet as we’ll discuss soon.
CSAM detection is even bigger than face and object detection. It’s something that many customers aren’t even aware is happening. When Apple tried to implement CSAM detection, they wanted to do so only with Hash Matching, where they observe unique fingerprints associated with each photo and compare them to a database of known CSAM hashes. This method notably does not scan photos directly. However, there was significant backlash to this that caused apple to backtrack. Google evaded criticism largely thanks to the fact that most people don’t know they do similar hash matching on photos stored in Google Photos, just in the server. Since then, Google has been in an even more difficult position as they can’t move their CSAM detection efforts to the device without risking similar pushback.
Google doesn’t just do hash matching, however. Since 2022, Google has used more direct methods of detecting CSAM, including scanning images directly with models designed to identify instances of child nudity. Google says their models are trained not to flag images of children in the bath or outside which raises many questions on its own. These models would also need to be moved on-device. If people didn’t want hash-matching on their devices, they definitely won’t want AI models trained to identify naked children directly scanning their photos on their devices.
The biggest thing in the way between Google Photos and end-to-end encryption is it’s open nature. Google Photos is an app that’s available on any Android device, web browsers, low-powered Chromebooks, and iOS devices. As a result, it’s not deeply integrated into the operating systems. In order to efficiently perform the AI functions it needs to, it needs suitable hardware capable of running AI models efficiently and it needs low-level access to that hardware to keep the operations running intelligently as not to slow down the user’s device, kill the battery, or heat it up. There is no guarantee that all devices Google Photos will run on will meet these requirements. For that reason, if Google were to move operations on-device, it would likely be through deep integration on their own Pixel hardware or partnerships with key manufactures. The functionality would not be available on all devices. This is not a problem on iOS with iCloud Photos as Apple has control over the devices the service runs on and can deeply integrate it into their OS's, allowing AI operations to kick in only when optimal to improve the user experience.
Believe it or not, there is one simply way in which Google could still offer end-to-end encryption on Google Photos despite these issues, a compromise. You can encrypt your photos end-to-end, but doing so will cost you all the features that are enabled by cloud processing. No face matching. No object recognition. No CSAM detection. No video boost. No advanced search functionality. The experience would be stripped down to the bare minimum, allowing you to upload photos, view them on other devices, and perform simple (likely destructive) edits.
After all that, you might be feeling pretty down about Google Photos. Let me help you a bit and go over some of the benefits you receive as a user of Google Photos, thanks to the fact that processing on Photos is done server side.
The biggest benefit is speed. Flagship smartphones now have dedicated AI cores, allowing for efficient execution of ML models. However, a large server will always have access to even better hardware capable of faster operations timed more efficiently. When you take a photo in Google Photos, assuming you have a network connection, it will be whisked away to the server, processed, and returned over the course of a couple minutes. This means you’ll quickly see those images appear in advanced search results and people albums. By contrast, iCloud Photos can only do this if you have another device on and ready to handle the processing in lieu of a server. For most people, you’ll need to wait until you hit the hay for the scanning to occur, ready for you the next day. While I don’t normally need access to advanced search features to find a photo I took an hour ago, it does mean that you could have a shared smart folder set to digest photos containing a certain people to which photos will be added automatically and nearly live.
Similarly, a benefit is efficiency. Performing processing on-device can be costly. This is one of the reasons Apple builds in so much performance overhead in their devices. While Apple optimizes the timing for this scanning to limit impact on daily battery life and heat, it’s not always perfect. By offloading the processing to the server, the device can keep things cool.
Finally, there’s accessibility. The fact that Google Photos doesn’t rely on the device hardware means it can be more easily accessed on a wider variety of devices. Similarly, the lack of E2EE makes logging into your Google account and decrypting you photos to view anywhere a breeze. With iCloud Photos with E2EE turned on, you need to jump through a few hoops to grant a browser a key to decrypt the data, allowing you to view it.
Google and Apple have architected their photo management systems in completely different ways. One has emphasized privacy and security while still offering many of the features people look for. The other has emphasized accessibility, creating a system that can be used across multiple platforms. As a result of that cross-platform design, it’s nearly impossible for Google to offer the security that’s becoming increasingly important without major compromises. Time will tell if , and how, Google handles these hurtles.