The common perception of software development maintains that the source code is the fundamental element of any application. As modern applications grow more complex, researchers are now eager to dive deeper and gain a wider and more comprehensive view of their application stack. Since standard techniques generated a limited and fragmented picture, the Research team at Enso Security decided to go back to basics. We initiated an in-depth assessment of the way we view the modern application code repositories with the objective of finding a new approach to maximize the data that can be extracted.
As outlined below, our research led us beyond the SBOM, (Software Bill of Materials) towards an even more comprehensive approach– something to the likes of an application stack bill of materials. This approach allows us to provide an extra layer of context and enrich our understanding of the application stack in ways we didn’t know before.
Understanding applications by understanding source codes
To start, we work off of a few basic and general truths regarding applications.
First, we know that a modern application’s code-base is far more than just the data stored in its code repositories, and consists of many configurations, automations, and dependencies.
Second, we acknowledge that one of the biggest issues in the field of AppSec is discovery – the ability to search in a relevant way across all data. Today’s discovery solutions provided by source code services are lacking in query language, in the sorting of results, and in the ability to relay complex syntax on both file and path names *and* the file’s content. Open-source and commercial solutions like sourceGraph, OpenGrok, FishEye, and new search capabilities in GitHub search have attempted to tackle these gaps.
Starting with the SBOM
Although awareness has skyrocketed due to major software supply chain attacks like SolarWinds, or the recent logging utility vulnerability Log4j, SBOM (Software Bill of Materials) is not a new concept. US President Joe Biden’s May 2021 Executive Order made it a requirement for software vendors contracting with the federal government to provide an SBOM. Existing solutions such as SAST and SCA provide such a list of all software dependencies based on the application’s code. AppSec teams must have the ability to produce a bill of materials and to provide a quick verdict for the use of specific software and point to all instances of its use.
But building an SBOM is not as simple as putting a list of ingredients together. Given the complexity of the application stack, software can vary from java library, to npm package, to docker image to WordPress plugin. While this might not sound like much of a challenge, taking an in-depth look at each one of the above can get complicated.
Case study- Log4j & dependency confusion
Let’s take the vulnerable log4j Java library search as an example. The conventional approach to ensuring that all bases are covered would be to review all files named pom.xml and build.gradle, ivy.xml , *.sbt, and *.bzl files. However, are file naming conventions the only ones that should be relied upon? Searching for other variations such as `org.apache.logging.log4j` would also be advisable - but not nearly exhaustive.
Another difficulty can occur when the project contains another library which imports log4j on its own. For example, a file that contains any variation of:`lombok.extern.log4j.Log4j2`.
Such variations can be easily missed or ignored when using just a regular scan. An alternative approach used by several solutions is scanning the application’s compiled artifacts, in order to avoid these gaps. This may result, however, in false positives, false negatives, and is a much more complex way of pointing to the original root cause that requires the fix, sending the team back to square one and to searching in the code.
Another example of an exploit that can fall between the cracks despite a comprehensive visibility approach is dependency confusion.
Every organization has its own internal packages, like SDKs or Frameworks which are heavily used in the organizational code. A simple mistake like importing a private npm package which has no public registry with the same name in the npm registry, without mentioning @your-company-name before the package name in your package.json file, could be leveraged by attackers as a dependency confusion attack, where the attacker will create a malicious package with the same name. The next time the simple command `npm install` will run in a project, the malicious package will then be installed instead of the private package.
Going beyond the SBOM: Create an infrastructure that allows you to understand everything.
How then can a systematic approach be used in order to ensure that no stone goes unturned?
Once the objective is appropriately defined, for example - comprehensive visibility - Enso built a complete infrastructure for fast and complex querying throughout the entirety of a customer’s source code. With a dedicated infrastructure built to facilitate this process, Enso added an internal API with specific requirements for various use cases. For example, in order to scale results when needed, each result is required to include the culprit of the vulnerability, in order to identify the relevant developer and remediate.
Once the infrastructure was set in place, Enso created an analysis for each dependency type with specific manifest-file criteria (e.g. file names/paths conventions and content), with respected fallback logic (e.g. if an npm lockfile exists use it, if not resort to package.json, etc.).
The team was then able to map this data per project per repo in order to support monoRepos with multiple projects. The next phase included mapping and supporting the next technology, and adding it to the growing data index.
The data was then indexed to allow easy and fast search capability across all manifests in all resources.
But we didn’t stop there.
In the process of building this infrastructure, we found that so much valuable data for organization security can be discovered this way. To dig deeper, we started adding manifest files that described the deployment state found in DevOps-related projects like Chef Infra or Terraform, NGINX config, and CI configurations like Jenkins and CircleCi. What we ended up with was a complete bill of material of the entire application stack. This capability allows us to not only query over the software composition, but also over CI/CD configurations, and even specific and unique organization configurations. Moreover, we can quickly add any new configurations using a fast implementation of the manifest query and parser.
The SBOM and beyond - A challenging but worthwhile endeavor
When the Log4Shell vulnerability was exposed, Enso’s customers were well prepared to face it. In a matter of seconds, clients were provided with a detailed report that showed the user in the proper way all log4j inventory by repository, by project, and even attribution to the specific developer. Furthermore, the data was accessible and ready to measure exactly when the update to code for mitigation was introduced and by whom.
Applications, just like humans, are complex. Listing the ingredients with an SBOM is an important step in the process. But in order to get a full picture of what our applications contain we need to go deeper and discover what the bill of materials is trying to tell us- the context of each package, it’s business context, how does it relate to the health of my AppSec program, which project it relates to, the implications of where it’s deployed, etc.
Imagine your SBOM as the list of nutrition facts on a box of cereal – the ingredients, the amount of sugar, calories, carbohydrates, etc. But this is just a list - what value does it bring you? What are the implications of this list on your health? What are the side effects of 3g of saturated fat per serving ? How does a high serving of Iron affect your immune system? Without context and insight, this is just a list of words. In order to gain the insight from the list of ingredients, we have to go deeper.
Building out an organizational infrastructure as detailed above is no easy endeavor. Many organizations are not at a maturity level in their application environment to require or undertake something like applying an application stack bill or let alone creating it. But if your organization is ready to get knee deep in the data, expose what is hidden and provide an extra layer of context and enrichment to your code, it is a worthwhile endeavor.
For the next Log4J, you will be more than prepared.