Automating Plant Recognition

Ubajaka CJ
2 min readDec 29, 2020
Automating plant recognition using leaf classification

Kaggle competitions have the play side of it. Well, Albert Einstein said, ‘Creativity is intelligence having fun’. We begin the creativity of the mind in a state of ecstasy. ‘The essential thing is to be in a state of ecstasy,’ writes the fresco painter Louis Dussour, ‘yet trying all the while to understand the connections and the structure of things’.[1] This is applicable also to the creativity in building Data Science products. So much for the digression.

The objective of this playground competition is to use binary leaf images and extracted features — shape, margin, texture — to accurately identify the 99 species of plants. We can distinguish plants through their leaves for the latter have volume, prevalence, and unique characteristics.

So, we set up our Data Science platform, importing the modules we need. We are using FastAi’s fastai.vision module for our image classification. Before any meaningful work can be done, the leaf classification dataset needs to be converted into a DataBunch object. The ImageDataBunch is the data bunch subclass for computer vision data. We use the ImageDataBunch to create our data from a CSV file. ImageDataBunch created will split the data into the parameter, folder; labeled in a file contained in the parameter, csv_labels between a training and validation set. An optimal test folder contains unlabelled data and the suffix parameter contains an optional suffix to add to the filenames in csv_labels. The normalize function is used to normalize transforms — the utility function to easily create a list of the flip, rotate, warp and lighting transforms — using, in this case, the imagenet_stats

data = ImageDataBunch.from_csv(leaf_path, folder='images', csv_labels='train.csv',suffix='.jpg', ds_tfms=get_transforms(), size=224, bs=64).normalize(imagenet_stats)

There are 99 classes in the datasets corresponding to the 99 species of plants. So, it is a multi-classification problem. We loaded our prepared data into our CNN learner cnn_learner, found our learning rate, and trained the model.

learn = cnn_learner(data, models.resnet152, metrics=error_rate, pretrained=True)

The error rate at the time of the experimentation is 0.090909. It can be improved with time. And the output of the experimentation is in a CSV file, containing a set of predicted probabilities for each image.

The full code can be found here.

References

[1] Antonin-Dalmace Sertillanges O. P., The Intellectual Life: Its Spirit, Conditions, Methods. The Newman Press, Westminster, Maryland, 1960.

--

--