Container Insights: Dissecting the World's Most Popular Containers
Sep 28, 2022
At Slim.AI, we care about what’s inside your containers. Indeed, we process hundreds or thousands of containers on our platform every day and are constantly trying to understand what makes containers safe and secure and easy to work with for developers.
This “application intelligence” is what helps us better understand what can be removed from a given container and what needs to stay for the container to operate. With an increased focus on supply chain security these days, knowing what’s in your container and shipping only what you need to production is more important than ever.
Since we began the company in 2020, our data research team has been analyzing the most popular public containers, collecting data on every new version we see in our system and running a variety of container analysis tools against them to better understand the challenges developers, security teams, and technology leadership face in the cloud-native landscape.
In this series of blog posts, along with a series of associated conference talks culminating in a Keynote talk at KubeCon North America 2022, I’ll be sharing our findings in hopes that we can shed light on the modern container landscape and suggest some ways we can make it safer, more accessible, and less complicated for developers to adopt.
How We Developed Our Container Intelligence
Gathering data and insights on containers has helped provide teams working in containers with the development direction required to solve some really lucrative problems in the container landscape today.
As Head of Strategic Insights and Analytics at Slim.AI, a developer-focused start up, containers are our world. Developer experience around container best practices—we have container enthusiasts using our various tools every - day, scanning containers, optimizing them, and sharing their experience with us. We thought it would be interesting to find out what's inside the public images that serve as starting points for nearly all modern software development. So, we looked. Here's what we found.
The Methodology & Purpose: How We Chose the Top Public Containers
This begs the question - why these 130 containers out of the 10M+ containers on Docker Hub? We started with these 130 containers for the 2021 report, but since have expanded to look at an even wider variety of images and also started looking at specific tags (like latest, slim, alpine) and tracking changes for a single container (say, node:latest) over time for longitudinal studies.
This begs the question - why these 130 containers out of the 10M+ containers on Docker Hub?
We started with these 130 containers for the 2021 report, but since have expanded to look at an even wider variety of images and also started looking at specific tags (like latest, slim, alpine) and tracking changes for a single container (say, node:latest) over time for longitudinal studies.
After extracting the data from Docker Hub, the Slim.AI data team found that these 130 containers constituted over 31 billion of its massive more than 120 billion container pulls in 2020 (that more than doubled over the next year seeing over 318 billion pulls in 2021). Among these containers are the most popular build tools, programming languages, data stores, DevOps tools, among some others.
Even more compelling is the fact that among these 130 containers are images that have been pulled individually more than one billion times.
In addition to the quantitative data, we also added a layer of qualitative data, through interviews of developers alongside container enthusiasts. In these conversations we wanted to understand which containers and tools they use in their stacks, their processes of packaging and shipping applications, and risk appetite or even awareness.
The purpose and main driver behind investing in this report was first and foremost from an intellectual curiosity perspective. How developers experience containers in the wild and the challenges they run into trying to make them both easier to work and secure is fascinating. At Slim.AI developer experience needs to be at the forefront of every developer tool. With the prevalence and exponential growth in container use, the developer experience becomes a central piece to scalability and continued use as a widely adopted developer tool. As vulnerabilities are discovered, and the attack surface grows, this also impacts speed, performance, and deployment frequency that have become important metrics in engineering organizations.
What We Found: A Complex Landscape
In order to be able to shine a light on all of these factors our report focused on three key pillars in container adoption and use, as they affect developer experience (and even we were surprised by the results):
- Size & Scan time: We found a nearly perfect correlation between these two variables, validating the hypothesis that bloated containers are a time sink for your CICD pipelines.
- For every 500MBs added, we saw a 50 sec increase in scan time.
- This number may seem trivial for shipping a single container to production, but scale changes the entire dynamic. In an organization where thousands of images are used in a typical org with hundreds of developers shipping images multiple times a day, this means real productivity losses for companies who aren’t optimizing their container
- Complexity hinders clarity even for experts
- We did a component analysis looking at all the relevant variables including packages, shells, libraries, licenses, and special permissions.
- We were expecting large outliers, but it turned out that even the averages were surprisingly high. It is typical to see hundreds of packages even in small, special
- Attack surface is more than just a vulnerability count:
- Don’t get us wrong: The vulnerability counts were mind-blowing - some of these popular containers we looked at had more than 2,000 known vulnerabilities in them.
- But what's really surprising was the distribution of severity of these vulnerabilities: 20% of all belonging to a high/critical severity category.
Behind all of this research we wanted to see if we could make a difference in terms of developer experience by reducing complexity, automating reuse and making it more intelligent, and create ongoing and continuous optimization cycles to enable trusted containers to run faster.
In addition to developer experience we discovered many other facets as they apply to engineering delivery and security. We found that size maps directly to operational costs, and the report dives into how and why bloated containers are a time and cost sink. Growing complexity impacts delivery and velocity, with a high level of expertise required to understand and map everything inside your containers, and ultimately running in production - including their required permissions, licenses, redundancy of code, and much more. In addition, with many redundant and superfluous packages shipped to production, the attack surface grows significantly and can create unnecessary risk for the organization.
In our next post, we’ll dive into each pillar and review the important takeaways and data you should be aware of as a developer leveraging images off of public registries. We’ll highlight some of the common gotchas, and provide tips for overcoming these challenges–to enable faster, more production-ready, and secure containers for every developer on the team.
Ayse Kaya, Senior Director of Strategic Insights and Analytics at Slim.AI, and her team of data storytellers have released a first-of-its-kind report on the state of the most popular containers on Docker Hub with an extensive analysis of the 130 most pulled containers on Docker Hub. Ayse, a data artist–whose vision is to bridge the gap between technical and executive teams through graphical representations of data, brings years of experience in cloud and supply chain security as well as in physical industrial engineering, applied mathematics and queuing theory - both in the physical and operational worlds. All of these together are the backbone of this report, Top Public Containers Report 2021 - where mathematics and statistical research meet engineering and the intersection of security, with a particular focus on the threats your supply chain introduces.
Want to discuss these findings with fellow container enthusiasts? Join us on Discord at: https://discord.gg/uBttmfyYNB.