[P] Detection of Safe Landing Location from Terrain Images
The purpose of this project is to devise a vision system that will return the safest area for an autonomous rocket powered lander to land given a terrain image.
I started working on this project a while ago, gave up, and am hoping to restart it soon. However, I think that the approach I’ve taken previously is suboptimal.
Current Work: https://github.com/GerardMaggiolino/SEDS-Aquarius-Vision
Some promising (not garbage) results on test data.
I felt that semantic segmentation was overly expressive; pixel-wise labelling is not necessary, since I only need to find a large, contiguous region that is safe. Additionally, I am creating my own data set, and wanted to find a way to solve this problem without requiring an extensive amount of time labelling, or creating labels that would need to be very, very precise. I settled on something like sliding-window, with ordinal classification over each region. The paper used for ordinal classification reference is in the repo.
I’m classifying individual patches of an image as safe or not safe – this would ideally be a regression problem, but I don’t have the ability to be that precise in my labelling. Binary classification would be poor, as there’s many images right on the boundary. A comprise is ordinal classification, where penalization for a “5” of safeness when the true label is “1” is much greater than a penalization for a “2” on the same image. I’m manually labelling terrain images between 1 and 5. I wrote a short script that allows me to label about six images per minute.
With a small network using strided depth-wise convolutions and only 411 greyscaled images without using augmentation, I achieved some promising results. Random classification would achieve a 52% accuracy of within at least 1 category correct (E.G. an output of 2 for a label of 1 is correct), and I achieved 96% with a few minutes of training. I’m hoping this would improve as I increase data set size, and I could perform augmentation to * 8 that with rotations and horizontal flips.
The problem: Classifying over a 1500 x 1500 image (about 2 megapixels) takes around 0.2 seconds on my 6 core MacBook Pro. The lander will likely have less computational power, and it needs to be near real time. This is with NO sliding-window overlap, meaning potential good regions could be excluded if they’re on boundaries.
I’m not sure. I’m unaware of what architectures are common for problems like this. At the end of the day, I want to return a single, precise location for where to land.
I’m thinking of two approaches: First, a YOLO-eqsue model, where safe regions are labelled in bounding boxes. Problem – safe regions are hardly ever in perfect rectangles. Second, a semantic segmentation model with limited / no upsampling, from which the center of the largest inscribed circle in the polygons corresponding to safe regions could be returned. Problem – speed might still be a concern, and labelling would be more challenging.
In both of the above, labelling would be more time consuming and difficult, potentially resulting in lower quality models. Additionally, I’m under the impression that the amount of data necessary to train a good patch detector (1 – 5 ordinal regression) is far less than that to train SSDs or FCNs.
Any advice would be greatly appreciated! Sorry for the long post, and thank you so much if you’ve made it this far.