Let me start by saying I agree with you.
To give a bit more context here, it isn't looking for a picture of "Hitler" as much as an exact match of the hash of that file that is known to be a picture of Hitler.
So this is why it is hypothetically possible that you can have a file that has the same hash, even though it is completely different. The chances are in the trillions that the hash would match though.
So the guts of this aren't looking at the visual content of the images, just the hash of the file itself. If that hash matches, and goes through where other automated checks are present, then a human would look at the image and see if it matches the expected content. So for an example, my image in the "Community" tab has an md5 hash of 0a37cb3d3aab7b471c12b4555fcc94fb. If you download it and run an md5 hash on it you should have the same hash result.
Code:
$ md5 /Users/turtle2472/Desktop/turtle2472.jpg
MD5 (/Users/turtle2472/Desktop/turtle2472.jpg) = 0a37cb3d3aab7b471c12b4555fcc94fb
I "modified" that image by running it through
ImageOptim and now the hash is different:
Code:
$ md5 /Users/turtle2472/Desktop/turtle2472.jpg
MD5 (/Users/turtle2472/Desktop/turtle2472.jpg) = fd1949d244aaf3c1f77fd78d707b74d3
However, it looks the same:
This does mean that we all say, sure scan for child porn because it is bad and no one should ever have it on their phones. Now that we can hash something with permission on a device we will expand that to be hash this type of file, say video, and see if it matches known child porn. Well, what if it is of a protest and a "bad government" wants all those files and those with them "questioned", now they can check those hashes and do what they want with them.
Yeah, it is a bad path but might be the least offensive option while maintaining privacy to a degree.
Maybe we should split this part of the thread out to a new thread.