Let’s start with a brief explanation of how the algorithm identifies an object. In the image below we can see that the red circles indicate an input layer. The input layer can be anything you are trying to detect or identify, for example, a human, chicken, dog or cat. Because of the nature of the security industry, we are currently only monitoring for human or non-human activity, so our categories are broader and include Human, Animal, Vehicle and Background. After an input is detected, in this example a dog, the algorithm needs to identify what the input is, it does this using various layers of intelligence that are all connected, similar to that of the brain’s neurons.




The neural network layer (indicated by the yellow circles) are all connected and consist of various layers of identification that happen. Within these neural networks a library is established through images that have been learnt to the device. All devices come with some intelligence pre-configured, but it is imperative for the device to have site specific learning at this point.


Initially the image is identified by simple shapes from its libraries, the shapes are classified as a possible match per category by giving a percentage match rating to each category (Human, Animal, Vehicle or Background). For an example, let’s say that the image of the dog had curves and lines in the pixels of the full image that resembled curves and line of the animal and human bodies.



The next step is for the neural network to identify more complex structures of the image to establish to what degree the input matches the categories and lastly a decision is made based on the information gathered by previously trained images.

We suggest training the device with the most clear and best possible images as per our training guide. Here is where some complications may arise. If you are using a camera to detect humans within its recommended frame size, a good image of a human will be easy to detect. If however you are using a camera with a fair quality detection at 15 meters away to detect humans at a distance of 50 meters the shapes of the human in the library of your Avlytics device no longer matches the shapes of the input image. See below:



If it is not possible to mount another camera for better quality detection you now need to create a work around by training your device on good human images at each of the distances indicated by the man with the hard hat on in the picture above. The training sets must all be proportionate to each other so that one category does not outweigh the other. To establish whether the images in your training sets are correct we review the Validation Metrics on your Grafana Maintenance dashboard for the specific site and camera number as in the example of the Graph below.



This graph shows the Predictions over time that would be predicted based on the Training that the device has received. The Algorithm validates the images sent into the training set and provides metrics that can be plotted on a graph to better represent the actual learning.

Let’s closely investigate what these numbers mean. The Training Feedback or Validation Scores indicated in the image is an analysis of the folders (or neurons) containing the library you have trained and identified images of Human, Animal, Vehicle and Background.



In this example “channel 1: Predicting Background – For: Animal”, we have an average score of 100 and a current score of 100. That means that 100 % of the time, when attempting to validate your images in your Animal folder, your device is detecting it as Background images (this can also be noted in “channel 1: Predicting Animal – For: Background” with a score of 0%).

You can now gather from this chart that your validation set (or images trained) is not very good and that your device has not yet learnt what animals look like and at various distances for this specific site. You will now need to train your device on more animal images and more background images for it to be able to make the differentiation.

Let’s look at one for value as an example for “channel 1: Predicting Background – For: Human Presence” it averages at 17% of the time it is falsely identifying your trained images in your Human Presence folder as Background images.

Think of you Classification Options or “Include Tags “as Individual Neurons, similar to that of a Human Brain.

When we Train a device the “Neurons” for each channel are charged with features extracted from the Sample images we have sent to the device.

To correct the above example we need to improve the algorithms understanding of Human presence by submitting more Human samples.

It is possible for an Image to have equal weighted scores for two neurons and this is the reason why.

If AME Predicts Human for Background with a value of 50% and Animal for Background at 50%.
This indicates that the features extracted from the Sample images submitted through training have common features in both the Human and Animal Categories, It is also likely that these features are common to the features observed in your devices Background Training samples submitted.

This means there is not enough differentiating features in each of these categories to unequivocally Tag the detection as a specific Category.  

To "charge" the Neuron with more distinguishing Features we simply train the device for Human, Animal and Background by submitting more Samples.

After more training has been provided to the device you will be able to see that the graph starts to decrease starting to lower the probability of a Background prediction for an Animal Event. After Training the device we can see the graph invert and the Algorithm starts to Bias Animal predictions for Animal events, due to increased training and better understanding.