Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Tool to shuffle training datasets for more randomness via hardlink (Windows)

https://github.com/TomArrow/FolderHardlinkShuffler

Note: This tool itself does not do any copying or anything, it merely creates a .bat file which you can inspect afterwards.

Purpose Let’s say you have a video you’re using for training ESRGAN, but you want more randomness in the individual batches, so that it doesn’t train on the same scene for any extended period of time, but always has a good mix of all kinds of scenes. At the same time you don’t want to shuffle the original data and you also want to be able to apply the same shuffle to both the low resolution and high resolution data.

Here’s how:

Open this tool, select the folder with your image sequence. Then select a target folder. Set prefix and amount of zeros to fill and hit the Generate button.

It will take some time to calculate the shuffle (dunno why it takes so long, might improve someday). Then it asks you for a place to save a .bat file.

This .bat file has two variables on top that you can change as you wish, srcFolder and dstFolder. And then it has one line for each file in the source folder, creating hardlinks of those files in a new folder via mklink /h

If you execute this .bat file, you will end up with a second folder of numbered files that are hardlinks to the original files, but in completely new randomized order. These files are essentially real files to the filesystem, not merely links. So while the file exists only once on your hard drive, it exists twice in your folder structure – once in the original folder in the original order, and once in the new shuffled order. You can delete the entire shuffled folder without losing the original files too. Either way it will not take up any extra space because it’s not an actual copy.

And you can apply the same .bat file to your low resolution samples. Just change the srcFolder and dstFolder variables in the .bat script to reflect this and run it again. The filenames of course have to be identical to the large resolution folder.

Hope this will be useful to someone. Of course it can be used for anything, but I made it specifically for this purpose.

submitted by /u/PMmeYOURrareCONTENT
[link] [comments]