Classifying architectural elements using AI (ML and neural networks): problem statement
I've recently finished the IBM's Avanced data science specialization and as capstone project I thought about being able identify the architectural elements present in an image: is it a column? A vault? Or a bell tower? Let's see what can we achieve!
The problem
Several moments in my live I faced myself visiting a building and not being able to identify what I had in front of me: I could describe its shape, the colors, and even the function, but I did not know its name.
For example, if I tell you that in front of me I have a staircase to a subterranean room. This room's purpose is to serve as funerary shelter to a wealthy person from 3000 years ago. How is this called?
The idea is that an algorithm is able to identify that as an entrance to an hypogeum, or rock cut tomb.
Do you imagine how funn would be to learn architecture having a guide always with you? Just take a picture and you will know what you are looking at.
Audience:
- Schools
- Institutions (town halls, museums...)
- Architect students
- People interested in architecture or historical landmarks in general
Possible user-facing interfaces or features:
- Website application to upload pictures
- Mobile application to take pictures with
- Show an explanation of the identified element
The dataset
As almost always, the second thing you need to do is search for the data: is there any dataset available? Does it contain the tags you need, or you must label it manually? Or you have to create the dataset from 0?
I would say that if you are doing a personal project, creating a dataset either from 0 or manually labeling is a costly process and you might need to reconsider your decision and search even more, so here my recommendation.
In my case, I was able to find the Architectural Heritage Elements Dataset by Jose Llamas licensed under the Creative Commons Attributions and created for a paper called Classification of Architectural Heritage Images Using Deep Learning Techniques. I've just found this article when writing this post, but it would have been amazing to find it before :_(
The data set contains more than 10.000 images classified in 10 categories. This should do the trick for a proof of concept.
Exploration
Once we have data, we can finally move to write things, so spin-up your favourite development environment and play around. In my case, it was images, so python and matplotlib were a good first option.
import matplotlib.pyplot as plt
import matplotlib.image as img
import os
# pseudo-code for loading data
images = []
labels = []
categories = os.listdir(DATASET_DIR)
for c in categories:
imgs = os.listdir(os.path.join(DATASET_DIR, c))
for i in imgs:
image = img.imread(os.path.join(DATASET_DIR, c, i))
images.append(image)
labels.append(c)
labels = np.array(labels)
And then a bit of more python code and we have images!
Next up...
The next post will be about models, traditional machine learning vs neural networks and extracting new features from the data.