Comment on page
Surface Crack Detection and Localization - Texas Instruments TDA4VM
A computer vision system that detects and localizes surface cracks in concrete structures for predictive maintenance, using a TI SK-TDA4VM Starter Kit.
Created By: Naveen Kumar

Cover Image
A 50-year-old bridge collapsed in Pittsburgh (Pennsylvania) on January 28, 2022. There is only one reason why a sturdy structure such as a concrete bridge could suddenly collapse: wear and tear.

Bridge Collapsed
Concrete structures generally start deteriorating after about 40 to 50 years. For this reason, overlooking signs of wear can result in severe accidents, which is why the inspection and repair of concrete structures are crucial for safeguarding our way of life. Cracks are one of the important criteria used for diagnosing the deterioration of concrete structures. Typically, a specialist would inspect such structures by checking for cracks visually, sketching the results of the inspection, and then preparing inspection data based on their findings. An inspection method like this is not only very time-consuming and costly but it also cannot accurately detect cracks. I built a surface crack detection application using machine learning in this project. A pre-trained image classification model is fine-tuned using the Transfer Learning with the Edge Impulse Studio and deployed to the Texas Instruments SK-TDA4VM Evaluation board which detects surface cracks in real-time and also localizes them.
Why do we want to localize the detection using an image classification model? Can't we use the object detection model? Yes, we can use the object detection model but we would need to add bounding boxes to thousands of samples manually. Existing object detection models may not be a good choice to auto-annotate these cracks since they are trained on definite shape objects. Repurposing the classification model for localizing the detection saves a lot of effort and still would be able to identify the regions of interest.
The CNN (convolutional neural networks) with GAP (Global Average Pooling) layers that have been trained for a classification task can also be used for object localization. GAP is a pooling operation designed to replace fully connected layers in classical CNNs. The idea is to generate one feature map for each corresponding category of the classification task in the last MLP conv layer. A GAP-CNN not only tells us what object is contained in the image - it also tells us where the object is in the image, and through no additional work on our part! The localization is expressed as a heat map (class activation map) where the color-coding scheme identifies regions that are relatively important for the GAP-CNN to perform the object identification task.
Since we wanted to use compact and portable yet powerful hardware, we will be using a Texas Instruments SK-TDA4VM Starter Kit for edge AI vision systems. It is powered by a TDA4VM processor that enables 8 TOPS of deep learning performance and hardware-accelerated edge AI processing. Also, we would need a USB web camera and a monitor (for displaying inferencing results).

Hardware
To install an Operating System, download the latest ti-processor-sdk-linux-sk-tda4vm-etcher-image.zip image and flash it to the SD card using the Balena etcher tool. After flashing, insert the SD card into the SD slot on the development board and turn on the power. We can view the boot log by connecting the UART cable to the computer and using a serial port communications program (CoolTerm or minicom) at a 115200 baud rate. We can use the same UART console to execute shell commands or we can connect an Ethernet cable to set up a remote ssh session. Since we will be using a custom TFLite model trained using the Edge Impulse Studio, we need to set up model compilation tools on a Linux machine. We are using a Ubuntu 18.04 Virtualbox guest on a macOS host. Please execute the commands below to set up the compilation environment.
$ git clone https://github.com/TexasInstruments/edgeai-tidl-tools.git
$ cd edgeai-tidl-tools
$ git checkout 08_05_00_11
$ source ./setup.sh
$ export DEVICE=j7
$ source ./setup.sh
The datasets were downloaded from Mendeley Data (Concrete Crack Images for Classification). The dataset contains various concrete surfaces with and without cracks. The data is collected from multiple METU Campus Buildings. The dataset is divided into negative and positive crack images for image classification. Each class has 20,000 images with a total of 40,000 images with 227 x 227 pixels with RGB channels.

Datasets
To differentiate crack and non-crack surface images from the other natural world scenes, 25,000 randomly sampled images for 80 object categories from the COCO-Minitrain, a subset of the COCO train2017 dataset, were downloaded. The data can be accessed from the links below.
We need to create a new project to upload data to Edge Impulse Studio.

The data is uploaded using the Edge Impulse CLI. Please follow the instructions to install the CLI here: https://docs.edgeimpulse.com/docs/cli-installation.
The downloaded images are labeled into 3 classes and saved into the directories with the label name.
- Positive - surface with crack
- Negative - surface without crack
- Unknown - images from the 80 objects
Execute the following commands to upload the images to the Edge Impulse Studio. The datasets are automatically split into training and testing datasets.
$ edge-impulse-uploader --category split --label positive positive/*.jpg
$ edge-impulse-uploader --category split --label negative negative/*.jpg
$ edge-impulse-uploader --category split --label unknown unknown/*.jpg
We can see the uploaded datasets on the Edge Impulse Studio's Data Acquisition page.

Data Acquisition
Go to the Impulse Design > Create Impulse page, click Add a processing block, and then choose Image, which preprocesses and normalizes image data, and optionally reduces the color depth. Also, on the same page, click Add a learning block, and choose Transfer Learning (Images), which fine-tunes a pre-trained image classification model on the data. We are using a 160x160 image size. Now click on the Save Impulse button.

Create Impulse
Next, go to the Impulse Design > Image page and set the Color depth parameter as RGB, and click the Save parameters button which redirects to another page where we should click on the Generate Feature button. It usually takes a couple of minutes to complete feature generation.

Feature Generation
We can see the 2D visualization of the generated features in Feature Explorer.

Now go to the Impulse Design > Transfer Learning page and choose the Neural Network architecture. We are using the MobileNetV2 160x160 1.0 transfer learning model with the pre-trained weight provided by the Edge Impulse Studio.

Model Selection
The pre-trained model outputs the class prediction probabilities. To get the class activation map, we need to modify the model and make it a multi-output model. To customize the model, we need to switch to Keras (expert) mode.

We can modify the generated code in the text editor as shown below.

Editor
We will connect the 2nd last layer which is a GAP layer to the Dense layer (a layer that is deeply connected with its preceding layer) with 3 neurons ( 3 classes in our case). We will be using this Dense layer's weights for generating a class activation map later.
base_model = tf.keras.applications.MobileNetV2(
input_shape = INPUT_SHAPE, alpha=1,
weights = WEIGHTS_PATH
)
last_layer = base_model.layers[-2].output
dense_layer = Dense(classes)
output_pred = Softmax(name="prediction")(dense_layer(last_layer))
For the class activation map, we need to calculate the dot product of the last convolutional block output and the final dense layers' weight. The Keras Dot layer does not broadcast the multiplier vector with the dynamic batch size so we can not use it. But we can take advantage of the Dense layer which internally does the dot product of the kernel weight with the input. There is a side effect in this approach, the Dense layer adds up bias weight to each dot product. But this bias weight is very small and does not change the final normalized values of the class activation map so we can use it without any problems.
conv_layer = base_model.layers[-4].output
reshape_layer = Reshape((conv_layer.shape[1] * conv_layer.shape[2] , -1))(conv_layer)
dot_output = dense_layer(reshape_layer)
We need to resample the dot product output to the same size as the input image (160x160) so that we can overlay the heat map. We use the UpSampling2D layer for this purpose.
transpose = Permute((2, 1))(dot_output)
reshape_2_layer = Reshape((-1, conv_layer.shape[1] , conv_layer.shape[2]))(transpose)
SIZE = (int(INPUT_SHAPE[1] / conv_layer.shape[2]),
int(INPUT_SHAPE[0] / conv_layer.shape[1]))
output_act_map = UpSampling2D(size=SIZE, interpolation="bilinear", data_format="channels_first", name="activation_map")(reshape_2_layer)
model = Model(inputs=base_model.inputs, outputs=[output_pred, output_act_map])
Also, we will be training the model from the last two convolutional blocks and freezing all layers before that.
TRAINABLE_START_IDX = -12
for layer in model.layers[:TRAINABLE_START_IDX]:
layer.trainable = False
The modified network architecture after the last convolutional block is given below. This is a multi-output model where the first output provides the prediction class probabilities and the second output provides the class activation map.