Sitecore smart image cropping, tags and alt text with AI: Azure Computer Vision – Part I.

Images are nowadays a critical part on websites, specially with “mobile first” approach and responsive designs. Cropping images in a proper way is extremely important if you don’t want to destroy your website’s user experience. Imagine an e-commerce website offering a product that is not visible anymore when the user is browsing the site from a mobile device. :facepalm:

In this post I’ll share a way to solve this issue with the help of AI, more specifically using Azure Cognitive Services (Computer Vision).

Azure Computer Vision

The Computer Vision API provides state-of-the-art algorithms to process images and return information. For example, it can be used to determine if an image contains mature content, or it can be used to find all the faces in an image. It also has other features like estimating dominant and accent colors, categorizing the content of images, and describing an image with complete English sentences. Additionally, it can also intelligently generate images thumbnails for displaying large images effectively. For more details about the API, refer to the official documentation here. It gives also some some good examples in C#.

First Step: Create the Azure resource

Before being able to play with this awesome service, we’ve to create the resource, and good news: the free plan would be enough for your tests (20 calls/min – 5K calls/month):

Login to the Azure portal, and go to add a new resource, search for “Computer Vision” and as usual, follow the wizard in order to create it.

Then just go to the “Keys and Endpoint” section and get your key, endpoint and location. Let’s write those down, we’ll use later to connect to our API.

For this implementation I’ll be using the following methods:

  • Analyze Image: This operation extracts a rich set of visual features based on the image content.
  • Get Area of Interest: This operation returns a bounding box around the most important area of the image.
  • Get Thumbnail: This operation generates a thumbnail image with the user-specified width and height. By default, the service analyzes the image, identifies the region of interest (ROI), and generates smart cropping coordinates based on the ROI. Smart cropping helps when you specify an aspect ratio that differs from that of the input image.

Testing the endopins

We can now use Postman for testing the API endpoints and the results we get. This is very straightforward by following the documentation from MS:

  1. Do a POST or GET (depending on the service you want tot test), to the following URL: https://{yourComputerVisionService}.cognitiveservices.azure.com/vision/v2.0/{APIMethod}?{Params}
  2. Add the needed headers:
    • Ocp-Apim-Subscription-Key: Your app key from the “Keys and Endpoint” previous section.
    • Ocp-Apim-Subscription-Region: Your app region from the “Keys and Endpoint” previous section.
    • Content-Type: application/json
  3. Add the URL of the image in the “Body“.

Let’s do a test with the following image:

Get Thumbnail

As you can see, Computer Vision is retrieving a cropped version of the image by the width/height we passed as parameters (200×200). And it’s cropping in the right way keeping focus in the most important part of the picture.

Get Area Of Interest

Same as generating the thumbnail, it retrieves the coordinates of the area of interest. As the thumbnail generation has some limitations that I’ll explain later, I’ll be using this method to crop the image.

Analyze Image

Depending on the parameters we send to this method, it will return a lot of different elements after analyzing the image, such as tags, description, brands information, etc. I’ll be using this method for generating tags but also to give an automatic alt text to the image.

Service Implementation

Let’s now implement the API service. As an starting point, we’ve to create a service that will take care of the communication to the Computer Vision API:

The ICognitiveServices Interface:

using Sitecore.Computer.Vision.CroppingImageField.Models.AreaOfInterest;
using Sitecore.Computer.Vision.CroppingImageField.Models.ImagesDetails;
namespace Sitecore.Computer.Vision.CroppingImageField.Services
{
    public interface ICognitiveServices
    {
        ImageDetails AnalyzeImage(byte[] image);
        byte[] GetThumbnail(byte[] image, int width, int height);
        AreaOfInterestResult GetAreaOfImportance(byte[] image);
    }
}

The CognitiveServices Class:

using System;
using Newtonsoft.Json;
using Sitecore.Configuration;
using Sitecore.Diagnostics;
using System.Net.Http;
using System.Net.Http.Headers;
using Sitecore.Computer.Vision.CroppingImageField.Models.AreaOfInterest;
using Sitecore.Computer.Vision.CroppingImageField.Models.ImagesDetails;
using Sitecore.Computer.Vision.CroppingImageField.Caching;
using Sitecore.Computer.Vision.CroppingImageField.Extensions;
namespace Sitecore.Computer.Vision.CroppingImageField.Services
{
    public class CognitiveServices : ICognitiveServices
    {
        private readonly string _cognitiveServicesKey = Settings.GetSetting($"Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.ApiKey", "");
        private readonly string _cognitiveServicesUrl = Settings.GetSetting($"Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.ApiUrl", "");
        private readonly string _cognitiveServicesZone = Settings.GetSetting($"Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.ApiZone", "");
        public ImageDetails AnalyzeImage(byte[] image)
        {
            var requestUri = _cognitiveServicesUrl + "analyze?" + Settings.GetSetting(
            $"Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.Analyze.Parameters", "");
            return CacheManager.GetCachedObject(image.GetHashKey() + requestUri, () =>
            {
                using (var response = this.CallApi(image, requestUri))
                {
                    if (response.IsSuccessStatusCode)
                    {
                        var result = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                        var responeData =
                            JsonConvert.DeserializeObject<ImageDetails>(result, new JsonSerializerSettings());
                        return responeData;
                    }
                    var errorMessage = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                    Log.Error(errorMessage, this);
                    return null;
                }
            });
        }
        public byte[] GetThumbnail(byte[] image, int width, int height)
        {
            var requestUri = _cognitiveServicesUrl +
                $"generateThumbnail?width={width}&height={height}&{Constants.QueryStringKeys.SmartCropping}=true";
            return CacheManager.GetCachedObject(image.GetHashKey() + requestUri, () =>
            {
                using (var response = this.CallApi(image, requestUri))
                {
                    if (response.IsSuccessStatusCode)
                    {
                        return response.Content.ReadAsByteArrayAsync().GetAwaiter().GetResult();
                    }
                    var errorMessage = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                    Log.Error(errorMessage, this);
                    return null;
                }
            });
        }
        public AreaOfInterestResult GetAreaOfImportance(byte[] image)
        {
            var requestUri = _cognitiveServicesUrl + "areaOfInterest";
            return CacheManager.GetCachedObject(image.GetHashKey() + requestUri, () =>
            {
                using (var response = this.CallApi(image, requestUri))
                {
                    if (response.IsSuccessStatusCode)
                    {
                        var result = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                        var responeData = JsonConvert.DeserializeObject<AreaOfInterestResult>(result, new JsonSerializerSettings());
                        return responeData;
                    }
                    var errorMessage = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                    Log.Error(errorMessage, this);
                    return null;
                }
            });
        }
        private HttpResponseMessage CallApi(byte[] image, string requestUri)
        {
            using (var client = new HttpClient())
            {
                client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", _cognitiveServicesKey);
                client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Region", _cognitiveServicesZone);
                using (var content = new ByteArrayContent(image))
                {
                    content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
                    return client.PostAsync(requestUri, content).GetAwaiter().GetResult();
                }
            }
        }
    }
}

The Config file:

      <setting name="Sitecore.Computer.Vision.CroppingImageField.AICroppingField.ThumbnailsFolderId" value="{C3EC5BF1-2182-40AB-AEE7-B2AE3C292620}" />
      <setting name="Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.ApiKey" value="{YOUR_APP_KEY}" />
      <setting name="Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.ApiUrl" value="https://{YOUR_AZURE_SERVICE_URL}.cognitiveservices.azure.com/vision/v2.0/" />
      <setting name="Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.ApiZone" value="{YOUR_ZONE}" />
      <setting name="Sitecore.Computer.Vision.CroppingImageField.AICroppingField.CognitiveServices.Analyze.Parameters" value="visualFeatures=Brands,Categories,Description,Faces,Objects,Tags" />

So, now we have our Azure Computer Vision resource created, our code is ready and we can connect and play with it.

In the next post, I’ll be creating a custom Sitecore image field that makes use of this implementation to solves the cropping issues and also adds the alt text automatically generated to the image. I’ll be sharing the code in GitHub but also a plugin package, stay tuned!

2 thoughts on “Sitecore smart image cropping, tags and alt text with AI: Azure Computer Vision – Part I.

  1. Pingback: Sitecore smart image cropping, tags and alt text with AI: Azure Computer Vision – Part III. | Miguel Minoldo

  2. Pingback: Sitecore smart translation tool with SPE and Azure Cognitive Services (AI) | Miguel Minoldo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s