Dockershim (Docker) runtime deprecation on AKS and how to fix our Sitecore containers

If you upgraded AKS recently (to v1.23+), you might have noticed some containers stopped working, most of the time failing to start with messages like “Error: failed to start containerd task “solr”: hcs::System::CreateProcess solr: The system cannot find the file specified.: unknown“.

As we spent quite some time researching those issues and also contacted Sitecore support, I’ve decided to write this post so it can be helpful to anyone else facing the same kind of issues.

Even if Docker runtime is still available in v1.23, it comes with Containerd by default so you will get that kind of exception. But bare in mind Docker is going to be fully removed on v1.24 so I suggest you take action as soon as possible to avoid blocking upgrading or facing issues later.

About Sitecore default images

In case you were making use of Sitecore images =< v10.0.2 then you will find those failing to start, mostly on the solr-init and mssql-init containers.

Sitecore has fixed the images for the following versions:

  • 10.0.3
  • 10.1.2
  • 10.1.3
  • 10.2.0
  • 10.2.1

Please notice that if you are referring to the image versions using the “two-digit” tag, then you’re good to go, as would be getting its latest version.

About the runtime deprecation

AKS announced the deprecation of Docker in version 1.20:

Dependency on Docker explained

container runtime is software that can execute the containers that make up a Kubernetes pod. Kubernetes is responsible for orchestration and scheduling of Pods; on each node, the kubelet uses the container runtime interface as an abstraction so that you can use any compatible container runtime.

In its earliest releases, Kubernetes offered compatibility with one container runtime: Docker. Later in the Kubernetes project’s history, cluster operators wanted to adopt additional container runtimes. The CRI was designed to allow this kind of flexibility – and the kubelet began supporting CRI. However, because Docker existed before the CRI specification was invented, the Kubernetes project created an adapter component, dockershim. The dockershim adapter allows the kubelet to interact with Docker as if Docker were a CRI compatible runtime.

Switching to Containerd as a container runtime eliminates the middleman. All the same, containers can be run by container runtimes like Containerd as before. But now, since containers schedule directly with the container runtime, they are not visible to Docker. So any Docker tooling or fancy UI you might have used before to check on these containers is no longer available.

You cannot get container information using docker ps or docker inspect commands. As you cannot list containers, you cannot get logs, stop containers, or execute something inside a container using docker exec.

Please refer to the official documentation for deeper details:

About our custom images

Ok, so now that things got a bit clear, and we know Sitecore base images are fixed in the latest versions (at least for v10), what about our custom ones?

So far, I’ve identified some changes required on our Dockerfile to make it work as expected in Containerd runtime.

ENTRYPOINT and CMD

The syntax is slightly different, I’ll share examples so it’s even easier to understand the changes.

This is the original Dockerfile:

ENTRYPOINT .\StartInit.ps1 -ResourcesDirectory $env:RESOURCES_PATH -SqlServer $env:SQL_SERVER -SqlAdminUser $env:SQL_ADMIN_LOGIN -SqlAdminPassword $env:SQL_ADMIN_PASSWORD -SitecoreAdminUsername $env:SITECORE_ADMIN_USERNAME -SitecoreAdminPassword $env:sitecore_admin_password -SitecoreUserPassword $env:SITECORE_USER_PASSWORD -SqlElasticPoolName $env:SQL_ELASTIC_POOL_NAME -DatabasesToDeploy $env:DATABASES_TO_DEPLOY -PostDeploymentWaitPeriod $env:POST_DEPLOYMENT_WAIT_PERIOD `
    -DatabaseUsers @(...)@]

The updated one:

ENTRYPOINT ["powershell.exe", ".\\StartInit.ps1", "-ResourcesDirectory $env:RESOURCES_PATH", "-SqlServer $env:SQL_SERVER", "-SqlAdminUser $env:SQL_ADMIN_LOGIN", "-SqlAdminPassword $env:SQL_ADMIN_PASSWORD", "-SitecoreAdminUsername $env:SITECORE_ADMIN_USERNAME", "-SitecoreAdminPassword $env:sitecore_admin_password", "-SitecoreUserPassword $env:SITECORE_USER_PASSWORD", "-SqlElasticPoolName $env:SQL_ELASTIC_POOL_NAME", "-DatabasesToDeploy $env:DATABASES_TO_DEPLOY", "-PostDeploymentWaitPeriod $env:POST_DEPLOYMENT_WAIT_PERIOD", `
    "-DatabaseUsers @(...)@]

Please note that now is needed to specify the shell we’re using.

Here is the updated solr-init Dockerfile:#

ENTRYPOINT ["powershell.exe", ".\\Start.ps1", "-SitecoreSolrConnectionString $env:SITECORE_SOLR_CONNECTION_STRING", `
    "-SolrCorePrefix $env:SOLR_CORE_PREFIX_NAME", `
    "-SolrSitecoreConfigsetSuffixName $env:SOLR_SITECORE_CONFIGSET_SUFFIX_NAME", `
    "-SolrReplicationFactor $env:SOLR_REPLICATION_FACTOR", `
    "-SolrNumberOfShards $env:SOLR_NUMBER_OF_SHARDS", `
    "-SolrMaxShardsPerNodes $env:SOLR_MAX_SHARDS_NUMBER_PER_NODES", `
    "-SolrXdbSchemaFile .\\data\\schema.json", `
    "-SolrCollectionsToDeploy $env:SOLR_COLLECTIONS_TO_DEPLOY"] 

I hope this helps clarify and fix your environments deployed on AKS as you get it upgraded.

Troubleshooting performance on your containerized Sitecore instances with dotTrace, dotMemory and PerfView

In the following videos I’m showing how to use dotTrace to take a profiling session and how to take a memory dump to analyze and troubleshoot performance issues of your application running in Docker containers.

In my previous post you can find a quick way to get your Sitecore Demo up and running, have a look!

Profile Sitecore running in Docker containers

Getting a memory dump from a container

I hope this helps you on your performance troubleshooting when running Docker containers!

Lighthouse Demo is now available in the Sitecore Container Registry, let’s try it!

The Lighthouse Demo joins the list of Docker images available in the Sitecore container registry (SCR). Let’s compose those images and have a look at this Sitecore 10 + SXA showcase!

This post assumes you’re familiar with Docker, you’ve the latest Docker desktop version installed on your Win 10 (1809 or higher) and you have a valid Sitecore license.

Spinning up a Sitecore environment has never been easier thanks to Docker containers.

Please refer to Github for more details about the Lighthouse demo, and find the detailed instructions here.

Let’s go!

Before starting, you just need to clone the repository locally:

  1. Open PowerShell with admin rights and navigate to your repository clone folder: cd C:\Projects\Sitecore.Demo.Platform
  2. Create certificates and initialize the environment file: .\init.ps1 -InitEnv -LicenseXmlPath C:\license\license.xml -AdminPassword b. (You can change the admin password and the license.xml file path to match your needs).
  3. Pull the latest demo Docker images: docker-compose pull
  4. Stop the IIS service: iisreset /stop
  5. Start the demo containers: docker-compose up -d
  6. Check the progress of the initialization by viewing the init container’s logs: docker-compose logs -f init

Troubleshooting errors

  1. If you get the following error

“ERROR: for traefik Cannot start service traefik: failed to create endpoint sitecore-xp0_traefik_1 on network nat: failed during hnsCallRawResponse: hnsCall failed in Win32: The process cannot access the file because it is being used by another process. (0x20)”

This normally means a port that is needed is already in use by another service. Make sure any of those ports are in use: 443, 8079, 8081, 8984, and 14330. Have a look here for more details.

In my case I had the port 8079 in use by Java:

Just by stopping the service fixed the issue. (Also make sure you stopped IIS as I mentioned in the first step).

If the issue still, and any of the needed ports are in use, you can also try this:

Stop-Service docker
Stop-service hns
Start-service hns
Start-Service docker
docker network prune

That’s it!

  • When you’re done with the demo, just stop it! docker-compose stop

I hope you find it interesting and enjoy as I did, a real quick and easy way to get a showcase Sitecore instance running locally in few minutes!