[D] On pornographic, NSFW and non-consensual images in the ImageNet dataset. What’s the path forward?
Dear Reddit-ML community,
In the imagenet dataset, ( classes 445 -n02892767- [’bikini, two-piece’] and 459- n02837789- [’brassiere, bra, bandeau’]) there are many images that are verifiably pornographic (you can see the porn-star’s webpage in the pic!), shot in a non-consensual setting, voyeuristic and also entail underage nudity (See collage here).This has deep ramifications not just in the legal realm for downloading and storing these images, but also has a trickle down effect with regards to the models trained on this dataset. Ex: If you are an artist making/selling neural art, the unethical nature of the seed images could sully the sanctity of the art (See: https://openreview.net/forum?id=HJlrwcP9DB )
The question now is: What’s the best path forward? Image deletion and replacement? Do chime in with your thoughts!
PS: I had written to the creators of the dataset (waay before the ImageNet Roulette thingy), but received no replies.
submitted by /u/VinayUPrabhu
[link] [comments]