Container Insights: Dissecting the World's Most Popular Containers

Join Ayse Kaya in this series, as she creates her 2022 Container Report Chalk Full of Important Security Findings for Developers.
Ayse Kaya
Sep 28, 2022

At Slim.AI, we care about what’s inside your containers. Indeed, we process hundreds or thousands of containers on our platform every day and are constantly trying to understand what makes containers safe and secure and easy to work with for developers.

This “application intelligence” is what helps us better understand what can be removed from a given container and what needs to stay for the container to operate. With an increased focus on supply chain security these days, knowing what’s in your container and shipping only what you need to production is more important than ever.

Since we began the company in 2020, our data research team has been analyzing the most popular public containers, collecting data on every new version we see in our system and running a variety of container analysis tools against them to better understand the challenges developers, security teams, and technology leadership face in the cloud-native landscape.

In this series of blog posts, along with a series of associated conference talks culminating in a Keynote talk at KubeCon North America 2022, I’ll be sharing our findings in hopes that we can shed light on the modern container landscape and suggest some ways we can make it safer, more accessible, and less complicated for developers to adopt.

How We Developed Our Container Intelligence

Gathering data and insights on containers has helped provide teams working in containers with the development direction required to solve some really lucrative problems in the container landscape today.

As Head of Strategic Insights and Analytics at Slim.AI, a developer-focused start up, containers are our world. Developer experience around container best practices—we have container enthusiasts using our various tools every - day, scanning containers, optimizing them, and sharing their experience with us. We thought it would be interesting to find out what's inside the public images that serve as starting points for nearly all modern software development. So, we looked. Here's what we found.

The Methodology & Purpose: How We Chose the Top Public Containers

This begs the question - why these 130 containers out of the 10M+ containers on Docker Hub? We started with these 130 containers for the 2021 report, but since have expanded to look at an even wider variety of images and also started looking at specific tags (like latest, slim, alpine) and tracking changes for a single container (say, node:latest) over time for longitudinal studies.

This begs the question - why these 130 containers out of the 10M+ containers on Docker Hub?

We started with these 130 containers for the 2021 report, but since have expanded to look at an even wider variety of images and also started looking at specific tags (like latest, slim, alpine) and tracking changes for a single container (say, node:latest) over time for longitudinal studies.

After extracting the data from Docker Hub, the Slim.AI data team found that these 130 containers constituted over 31 billion of its massive more than 120 billion container pulls in 2020 (that more than doubled over the next year seeing over 318 billion pulls in 2021). Among these containers are the most popular build tools, programming languages, data stores, DevOps tools, among some others.

Even more compelling is the fact that among these 130 containers are images that have been pulled individually more than one billion times.

In addition to the quantitative data, we also added a layer of qualitative data, through interviews of developers alongside container enthusiasts. In these conversations we wanted to understand which containers and tools they use in their stacks, their processes of packaging and shipping applications, and risk appetite or even awareness.

The purpose and main driver behind investing in this report was first and foremost from an intellectual curiosity perspective. How developers experience containers in the wild and the challenges they run into trying to make them both easier to work and secure is fascinating. At Slim.AI developer experience needs to be at the forefront of every developer tool. With the prevalence and exponential growth in container use, the developer experience becomes a central piece to scalability and continued use as a widely adopted developer tool. As vulnerabilities are discovered, and the attack surface grows, this also impacts speed, performance, and deployment frequency that have become important metrics in engineering organizations.

What We Found: A Complex Landscape

In order to be able to shine a light on all of these factors our report focused on three key pillars in container adoption and use, as they affect developer experience (and even we were surprised by the results):

  1. Size & Scan time: We found a nearly perfect correlation between these two variables, validating the hypothesis that bloated containers are a time sink for your CICD pipelines.
    • For every 500MBs added, we saw a 50 sec increase in scan time.
    • This number may seem trivial for shipping a single container to production, but scale changes the entire dynamic. In an organization where thousands of images are used in a typical org with hundreds of developers shipping images multiple times a day, this means real productivity losses for companies who aren’t optimizing their container
  2. Complexity hinders clarity even for experts
    • We did a component analysis looking at all the relevant variables including packages, shells, libraries, licenses, and special permissions.
    • We were expecting large outliers, but it turned out that even the averages were surprisingly high. It is typical to see hundreds of packages even in small, special
  3. Attack surface is more than just a vulnerability count:
    • Don’t get us wrong: The vulnerability counts were mind-blowing - some of these popular containers we looked at had more than 2,000 known vulnerabilities in them.
    • But what's really surprising was the distribution of severity of these vulnerabilities: 20% of all belonging to a high/critical severity category.

Behind all of this research we wanted to see if we could make a difference in terms of developer experience by reducing complexity, automating reuse and making it more intelligent, and create ongoing and continuous optimization cycles to enable trusted containers to run faster.

In addition to developer experience we discovered many other facets as they apply to engineering delivery and security. We found that size maps directly to operational costs, and the report dives into how and why bloated containers are a time and cost sink. Growing complexity impacts delivery and velocity, with a high level of expertise required to understand and map everything inside your containers, and ultimately running in production - including their required permissions, licenses, redundancy of code, and much more. In addition, with many redundant and superfluous packages shipped to production, the attack surface grows significantly and can create unnecessary risk for the organization.

In our next post, we’ll dive into each pillar and review the important takeaways and data you should be aware of as a developer leveraging images off of public registries. We’ll highlight some of the common gotchas, and provide tips for overcoming these challenges–to enable faster, more production-ready, and secure containers for every developer on the team.

Bio:

Ayse Kaya, Senior Director of Strategic Insights and Analytics at Slim.AI, and her team of data storytellers have released a first-of-its-kind report on the state of the most popular containers on Docker Hub with an extensive analysis of the 130 most pulled containers on Docker Hub. Ayse, a data artist–whose vision is to bridge the gap between technical and executive teams through graphical representations of data, brings years of experience in cloud and supply chain security as well as in physical industrial engineering, applied mathematics and queuing theory - both in the physical and operational worlds. All of these together are the backbone of this report, Top Public Containers Report 2021 - where mathematics and statistical research meet engineering and the intersection of security, with a particular focus on the threats your supply chain introduces.

Download the full 2021 Container Report (registration required)

Want to discuss these findings with fellow container enthusiasts? Join us on Discord at: https://discord.gg/uBttmfyYNB.

Related Articles

5 Common Container Exploits

From Malware, to Access Control Risks, and Beyond

Chris Tozzi

Contributor

What We Discovered Analyzing the Top 100 Public Container Images

Complexity abounds in modern development

Ayse Kaya

Analytics & Strategy

2022 Public Container Report

Vulnerabilities continue to increase and developers are struggling to keep up.

Ayse Kaya

Analytics & Strategy

Five Things You Should Never Ship to Production in a Container

Here is our take on five things to avoid when creating a container or shipping it to production.

Chris Tozzi

How to Find, Fix and Prioritize Vulnerabilities in Your Docker Container Image

Strategies for addressing a large volume of vulnerabilities in a container environment.

Theo Despoudis

Contributor

Introducing Slim's Scanner Orb for CircleCI

Get vulnerability and container composition analysis with every new container build

Heather Thacker

Contributor

The 4th S of Software Supply Chain Security

An approach to Front Line Software Supply Chain Security (SSCS).

John Amaral

CEO

Using AppArmor and SecComp Profiles for Security Audits

Conduct better container security audits using tools like SecComp, NGINX, and Docker.

5 Best Practices Production-Ready Containers

Knowing what’s in a container is critical to securing your software supply chain.

Martin Wimpress

Community

Better Security Audits with AppArmor and SecComp via SlimToolkit

Combine the power of tools like SecComp, NGINX, and Docker.

5 Most Commonly Asked SlimToolkit Questions

We enlisted SlimToolkit expert and Slim.AI Developer Experience Engineer to dive into how container slimming works.

Primož Ajdišek

Technical Staff

5 Ways Slim Containers Save You Money

Do slim containers really save you money on your cloud bill? Are there cost advantages to smaller containers? Find out here.

Chris Tozzi

Automating SlimToolkit in Your CICD Pipeline

Using GitHub Actions, you can refine container images automatically making them smaller, faster to load, and more secure by default – all without sacrificing any capabilities.

Nicolas Bohorquez

Contributor

Building Apps Using Cloud Native Buildpacks

Getting started with this innovative technique

Vince Power

Contributor

Building SlimToolkit into a Jenkins Pipeline

A step by step tutorial on building SlimToolkit into your CI/CD pipeline.

Clarifying the Complex: Meet Ivan Velichko, Container Dude at Slim.AI

Ivan recently joined the team at Slim.AI, and we sat down with him to learn more about the path that led him here.

Ivan Velichko

Container Dude

Container of the Week: Python & Flask

Our weekly breakdown of a popular container

Containerizing Python Apps for Lambda

A tutorial on deploying AWS Lambda using containers, Python edition.

Docker Containers for Your Raspberry Pi

Compact PCs need compact apps

Martin Wimpress

Community

Explore and Analyze a Docker container with SlimToolkit's X-Ray

Understanding container composition

Martin Wimpress

Community

Five Proven Ways to Debug a Container

When Things Just Are Not Working

Theofanis Despoudis

Contributor

Increasing Your CI/CD Velocity with Slim Containers

We’ll explain what Slim Containers are, how they speed up the build process, and how they can improve the efficiency of your testing.

Mike Mackrory

Contributor

Integrate Testing into Your Container Pipeline

A closer look at testing within container pipelines, CI/CD, software delivery, and containerization.

Faith Kilonzi

Software Engineer

Reducing Docker Image Size - Slimming vs Compressing

Know the difference

Pieter van Noordennen

Growth

Serverless Applications and Docker

How to Scale the Latest Trend in Infrastructure

Pieter van Noordennen

Growth

Slim.AI Docker Extension for Docker Desktop

How to access our Docker Extension and try it for yourself.

Josh Viney

Product

Slimming a Rails Application with SlimToolkit

Dissect a simple Rails application container using SlimToolkit to analyze, optimize, and deploy your product more quickly.

Theofanis Despoudis

Contributor

Where Do You Store Your Container Images?

Container Registry Options are Growing in Number and Complexity

Pieter van Noordennen

Growth

What’s in your container?

Why Docker Layers matter for container optimization

Pieter van Noordennen

Growth

Why Developers Shouldn't Have to Be Infrastructure Experts, Too

Simplifying processes required to containerize and deploy cloud-native apps.

Chris Tozzi

A New Workflow for Cloud Development

Leverage the benefits of containerization without the headaches & hassle

John Amaral

CEO

Why Don’t We Practice Container Best Practices?

Container best practices are easy to understand, hard to do

John Amaral

CEO

Cloud Development Is Still Too Manual & Complex

Lessons we learned from interviewing more than 30 developers

John Amaral

CEO

Getting Started with Multi-Container Apps

Up your container game with Docker Compose

Nicholas Bohorquez

Contributor

The Squeak Interview

CEO John Amaral joins Chris on his livestream