Some shortcuts I learned on Windows

Tita of Data
5 min readApr 26, 2020

--

filtering options on Windows File Explorer

Even as I jumped into the #100DaysOfCode challenge, I still had to work on the projects with my company. If you have read my previous blog about How to Get Away with No-code, you would know that I’m in the process of scraping a bunch of photos from the Internet, and as a mortal being, I have been trying to do it the simplest way I can, a.k.a., I have been trying to avoid code in the whole process. Ha!

After scraping all these photos and downloading them, I now have a bunch of folders (more than 100) with different filename convention and containing a bunch of different file formats (jpeg, tiff, png, xml). The reason why they have different file formats and different filename conventions is because

  1. I scraped bing and google — and the results from google/bing images are typically of mixed formats
  2. The datasets I downloaded and requested from researchers/academics are usually from a previous deep learning or computer vision project they worked on, hence the xml files which contain the annotations, bounding boxes, etc

So now, the next challenge is consolidating all of these images into less folders with the filename convention that I want.

The first project was, in essence, an early detection system. When the target object is detected in a photo captured by any of the cameras, a warning or alert has to be triggered and displayed for our staff to see. In short, I need to build an Object Detection system for this client, and to save costs, my tools and references should mostly be open-source, as much as possible. Make no mistake, since Day 1, my very cool, tech-savvy, progressive boss told me, in front of the whole Engineering Team, that I should not worry or be limited by costs — that they would be willing to pay to get a system up and running, should I see the need for it. He mentioned this because in my first proposal for which Object Detection Algorithm to use, my conclusion was to use SSD instead Faster RCNN, primarily because of the limitation of my computer. I even asked the team if we had HPC to our disposal, and to no surprise, they said no.

As a Hello World to my first IoT project with this company, instead of jumping right into Object Detection, which really had a steep learning curve for me (I can’t even install Tensorflow GPU, I keep getting errors on cmd), I decided to try the following first:

  1. Non-ML solution: I created a program where the input is an image and the output is a count of the target objects in that image. The solution is purely based on image processing: Gaussian blur, image thresholding, noise filtering, etc — basically, the goal is to a) remove all the noise and b) separate the foreground and background from the object. The result was very poor. It counts a high number of the target object even when the image contains none.
  2. DL-solution ran locally: I used tensorflow and Keras to do Image Classification. The solution uses mobilenet (SSD) to determine to which class the input image belongs to. The input to the model is an image and the output is the prediction (class/label of the object) and the probability (in percent) or how confident the model is that it belongs to that class/label. The result was quite decent. It could classify between the 2 classes almost perfectly. The only downside to this is that it takes a very long time to run and took a huge toll on my machine. Whenever I ran the training program, I could literally hear the fans in my laptop spin and it heat up so fast that it hurt to rest my hand on the touchpad. Ha!
  3. Google Cloud Platform and the AutoML Cloud Vision API: In order to avoid melting the company laptop into non-existence, the next solution was to train the model on the cloud instead. Google Cloud Vision has a very intuitive UI so this part went very smoothly. The first iteration had partially decent results (75% precision, 75% recall) and of course, that was because I had a dataset of less than 100 images with some of those images being low quality. So again, the next step to this was to import more images.

Across all 3 solutions, there’s one thing that was necessary: a consolidated dataset with good quality images (not corrupted images, and should be realistic/follows what the object really looks like in practice). In the attempt to make this very tedious job of consolidation easier and faster, here are some of the tricks I learned:

  1. Extracting files from multiple folders
  2. Filtering the files for what I need, in my case: I had to type “*kind:image” on the search bar OR click the Search tab at the top
  3. Renaming all the files in a folder in a sequential number order (the object detection code kept running into this error:

FileNotFoundError: [Errno 2] No such file or directory

Then choose among the dropdown when you click Kind

Dropdown in the Kind tab
the search tab autofills with this when you choose among the dropdown

With these simple shortcuts, I was able to consolidate within minutes. The next goal is to automate the whole process of scraping and consolidating using Python. When I get there, I’ll share the whole process. Ciao!

--

--

Tita of Data
Tita of Data

Written by Tita of Data

titaofdata.xyz | Product | Data | IoT | Automation | Decision Science

No responses yet