In software development, a small coding error can result in a critical vulnerability that ends up compromising the security of an entire system or network. Many times, a security vulnerability is not caused by a single error, however, but rather by a sequence of errors that occur during the course of the development cycle: a coding error is introduced, it goes undetected during the testing phases, and available defense mechanisms do not stop a successful attack.
Security must be a priority in all phases of software development. Effort should be aimed at preventing software vulnerabilities—detecting them before release, of course, but also limiting their practical impact (for example, by reducing the product's attack surface). At Microsoft, such a holistic approach to security is implemented through the Security Development Lifecycle (SDL), which covers all major phases of software development, including educating developers, improving design, employing coding and testing practices, and preparing for emergency responses after the release of a product, as you can see in Figure 1. SDL is not the only way to approach code review, but it does form the basis for much of what we'll cover here.
In this article we will discuss manual security code reviews performed by developers or security experts. In a process defined by the SDL, such efforts usually take place during a security push or penetration-testing engagement and are associated with a final security review. Coding errors can be found using different approaches, but even when compared to sophisticated tools, manual code reviews have clearly proven their value in the areas of precision and quality. Unfortunately, manual code reviews are also the most expensive to execute.
We also intend to discuss in detail the advantages and disadvantages of security code reviews in the context of large software projects. This article has been prepared based on experiences gathered over time, through reviews of major products released by Microsoft over the last few years.
Software Security Vulnerabilities
A software product of nontrivial size and complexity should never be assumed free of security vulnerabilities. All possible steps should be taken to limit the number of coding errors and reduce their practical impact, but something is always missed. Software errors that affect security (referred to as vulnerabilities) can exist at different levels of the application and be introduced during different phases of the development cycle.
Vulnerabilities are not limited to code. They can be introduced as early as the requirements definition in the form of a requirement that cannot be implemented in a secure manner. The basic design of a product may also contain flaws. For example, inappropriate technologies may be selected or their use may be incorrect from a security point of view. Ideally all of these problems will be identified by design reviews or threat modeling during the early stages of product development.
Security code reviews are primarily aimed at finding the code-level problems that still cause a majority of security vulnerabilities. These efforts may result in the identification of design issues as well, but such issues could be related to needed improvements in threat modeling and other aspects of the development process. Source code reviews can also be conducted with non-security-related priorities. But in this specific context, the goal is to find code vulnerabilities that could be used to either break security guarantees made by the product or to compromise the security of the system.
A coding error that can potentially cause security problems (such as problems due to lack of validation) must fulfill specific conditions to constitute a security vulnerability. There must be a security boundary to attack, and an attacker needs to have some level of control over the data or the environment. Problems that may exist in code that is executed within the same security context as the attacker offer no potential privilege gain in exploiting the vulnerability. In other cases, vulnerabilities exist, but they are located in code that cannot be executed because it is not accessible by the attacker. These coding errors, although they may affect product reliability, should not be considered actual vulnerabilities. The ultimate goal of security code reviews is to find code vulnerabilities that are accessible by an attacker and that may allow the attacker to bypass a security boundary.
Note that the accessibility of a vulnerability is not equivalent to its exploitability; a successful attack may still be mitigated by platform enhancements such as the /GS flag, the /SafeSEH flag, or Address Space Layout Randomization (ASLR).
The exploitability of a code vulnerability is not in the scope of investigations performed during code reviews, primarily because it is usually impossible to prove that available exploitation mitigations are sufficient. The responsibilities of a code reviewer end with confirmation that a code vulnerability exists and that it is appropriately triaged. From that moment the bug should be considered simply another problem that requires a fix.
Identifying code vulnerabilities is a primary goal of security code reviews, but there can be additional outcomes from the effort. A reviewer may provide feedback about overall code quality, redundancy, dead code, or unnecessary complexity. Reviewers may also deliver recommendations for improvements in reducing the attack surface, data validation, code clean-up, or code readability (such as improving comments). However, since documenting such outcomes takes time, you should decide before starting whether results should also include such recommendations or whether efforts should focus solely on identifying security problems.
Finding Coding Errors
There are different approaches to finding code errors. Since each has both unique advantages and practical limitations, it is important to understand the difference between code reviews and other options. For the purposes of this article, we will assume that both source code and design documentation are available, and that a so-called "white-box" analysis of the product is performed (an internal analysis, as opposed to a "black-box" approach that focuses only on externally visible behavior).
Code review can be described as a manual and static approach to finding coding errors. Using similar descriptions, two other popular approaches can be described as an automated static approach and an automated dynamic approach. The automated static approach usually takes the form of static code analysis tools that operate on source code in an attempt to identify known types of problems defined using patterns. PREfix and PREfast are examples of this approach. The automated dynamic approach is represented by automated code testing techniques (such as fuzzing), which are primarily focused on files, protocols, and APIs. Although these solutions can be applied also in a black-box approach, much better results can often be achieved if information about internal elements such as file formats is available and used appropriately.
Each approach has certain practical advantages and limitations. Static code analysis tools allow more code to be processed through automation, but findings are strictly limited by the set of predefined patterns for known types of problems. The results may often also contain a large number of false positives that make addressing issues difficult with limited resources. Fuzz testing can be easily automated and conducted on a continuous basis, but it operates in at least a partially random manner and may have problems with reaching deeper parts of the code. In most cases it is relatively easy to conduct basic fuzzing, yet it is much more difficult to achieve complete coverage of critical code paths.
Compared to automated methods, the advantages and disadvantages of manual source code reviews are connected primarily with direct involvement of a human reviewer. Results of efforts may differ significantly because they depend on a participant's experience with specific technologies, architectures, and scenarios. Generally, it is wise to assume that a human reviewer still has advantages over a machine-based solution. The reviewer can learn throughout the process, understand the context of software components, and interact with designers and developers. Additionally, a reviewer can provide feedback that is not limited only to a detailed issue report, but can also include high-level recommendations.
At the same time, however, a reviewer is susceptible to human error, fatigue, or boredom. The scope of a review is usually limited, and evaluating the quality of the review may be difficult since it is usually based on the subjective confidence of the reviewer.
These disadvantages can be overcome, however. One can design a task-specific review, aimed at analysis of a code block in multiple passes for only one type of error at a time. Using another approach, multiple reviewers can perform independent reviews of a critical piece of code, thus limiting the probability of human error. Code reviews are one of the specific cases where redundancy has huge potential value as it allows overcoming the limitations of human involvement.
Manual code review should never be considered as the ultimate solution for finding code vulnerabilities or as a replacement for other approaches, but rather as a complementary solution. Code reviewing should never be introduced to replace threat modeling, fuzzing, or enforcing coding best practices. Compared to other approaches, code reviewing is usually more expensive, so it should be used primarily in the scope of the most critical areas and problems where the effectiveness of other approaches is limited.
The Code Review Process
Security code review is most successful if it is planned and executed in the context of other security-related efforts such as threat modeling (see Figure 2). Additionally, the results from code reviews can show additional value by improving other security tasks such as testing and design.
Figure 2 Code Reviews
The value of good threat models for code reviewing can hardly be overestimated. To conduct a successful code review, a reviewer must have a good understanding of a product's goals, its design, and the technologies used for its implementation. The first two areas are also covered by threat modeling, and, although they focus on finding high-level problems, they include research also useful in code reviewing. These two kinds of security efforts are generally complementary; threat modeling helps to identify a critical area of code that then becomes a subject of detailed review. Results from a code review can likewise be used to validate (or question) security assumptions specified in a threat model.
In an ideal situation, a security code review would begin with a review of the quality threat models and design specifications, then move to source code. Before beginning, all development work on code within the scope of the review should be completed. To catch the simpler issues, available security tools should be run against the code before manual review begins. Finally, to avoid finding already-known issues, all previously identified security bugs should be fixed.
The security team should plan code reviewing efforts sufficiently early in the development lifecycle to identify the most suitable time for its execution. An actual plan should also address organizational decisions that might affect the overall return on this investment. For example, there are three likely options for selecting participants in a code review. First, reviewers can be external to the code and product (a code review expert from a penetration testing team). Second, a reviewer can be external to the code but familiar with the product (a developer from another team in the same organization). Finally, reviewers can be familiar with both the code and the product (such as the developer working on the code).
Each option has advantages as well as limitations. In the case of reviewers external to a product, usually skilled security experts are selected and they can make a big difference thanks to their unique experience and perspective. However, in most cases they will have a lower level of understanding of a product's internals and implementation details compared to members of the product team. Developers who wrote the specific fragments of code usually understand them best, but they are also the most susceptible to creator's bias and might therefore overlook significant coding errors.
None of these options is an ultimate solution that can be automatically applied to every case. Code reviewing depends more on human participants than technical solutions and thus it needs to be adopted by the team that is assigned to the effort. Some teams achieve the best results by discussing and analyzing the code in group meetings. In other cases, developers obtain the best results by walking through the code on their own. The team should choose whatever strategy allows participants to be effective in reviewing a product. This general rule applies not only to planning and organizing a code review, but to all challenges that need to be faced in such a process.
Prioritizing Review Efforts
The most important rule for code reviewing is realizing that there is never enough time to review all the code you would like to review. One of the biggest challenges, therefore, is selecting and prioritizing the product's components or code fragments that belong in the primary, or at least initial, scope of analysis. To achieve this goal, it is necessary to understand the design of a product and the role of specific technologies used to implement it. The prioritization is a high-level analysis and it defines the framework for the whole engagement.
The prioritization begins with analysis of threat models and other documentation available for a product. If properly done, a threat model can provide a lot of useful data about relations between different components, entry points, security boundaries, dependencies, and assumptions in the design or implementation. Since the goal of the prioritization is to gain understanding of the implementation details, information from a threat model must then be compared against the code itself, analyzing information about the code's structure, quality, or development process.
Prioritization can be divided into four major tasks, which are presented in Figure 3 along with examples of questions that may be helpful in finding required information.
Figure 3 Code Review Prioritization Tasks and Questions
Understand the Environment and Technologies Used to Implement the Product
Is it a Trusted Computing Base (TCB) component: kernel driver, subsystem, or privileged service?
Is it started by default, co-hosted, or running as a dedicated process?
Is it a reusable plug-in, driver, codec, protocol handler, or file parser?
Does it load and directly invoke other code (third-party libraries)?
What underlying technologies are used (RPC, DCOM, ActiveX, WMI, SQL)?
What languages (C/C++, C#, Visual Basic) and libraries (STL, ATL, WPF) are used?
Enumerate All Sources of Untrusted Input Data (Entry Points)
Is it available via network (TCP, UDP, SMB/NetBIOS, named pipes)?
Does it use IPC communication (LPC, shared sections, global objects)?
Can it be driven programmatically (automation or scripting)?
What resources does it use (registry, files, databases)?
Does it have a UI, parse command-line arguments, or use environment variables?
Does it directly communicate with external hardware (USB driver)?
Determine Who Can Access Entry Points and under What Conditions
Is it a security boundary that allows for potential elevation of privilege?
Is this entry point limited to remote, local, or physical access?
Are there access restrictions on transport, interface, or function level?
Is authentication required (anonymous, authenticated, service, admin)?
What mechanisms are used for authentication, securing secrets, and so on?
Are there any other constraints (allow/deny lists, certificates)?
Include Nontechnical Context of Code
Is it new or legacy code? Is it still in development or in maintenance mode?
Was the security context changed (code moved from user to kernel mode)?
What is the history of security efforts in this product team or organization?
What is the history of the product's security?
What is the security awareness of the team or organization?
What are the results from previous code reviews?
What kind of problems have been detected using automated tools? Where?
The first task is related to understanding the technological context of the code. This context covers not only the specific technologies that are used in a product, but also operating system and third-party dependencies as well as tools used in development. The goal of this task is to identify relationships between a product and other systems, applications, or services. Based on these relationships, it is possible to determine what components a product relies on as well as what other software depends on your product. In the context of security, these relationships determine how a product affects the rest of the system and how it may be affected by it. Some high-risk areas become visible through this process.
The high-risk areas are usually associated with concepts such as security boundaries and potential attack vectors. In practice, it all can be reduced to data that can be controlled by an attacker, which should be considered untrusted, and entry points from which this data can come. The team must analyze each of these entry points in relation to guarantees and assumptions about incoming data. One key question is when and how data is validated. This question can often be related to actual data characteristics (can it change asynchronously or be sent out of order?). It can also be related to characteristics of an entry point, such as whether it is available by default (a network service or programmatic interface) or whether it is created as a result of a user's action (a click or opening a data file).
The characteristics of an entry point are also in the scope of the next big task: analysis of trust relationships and access control. A product may have many different entry points, but its actual attack surface is defined only by those that are accessible to an attacker across a security boundary. Each of the entry points should therefore be investigated to verify who can access it and under what conditions. Unauthenticated access to wide functionality is obviously more interesting for an attacker than entry points accessible only with administrative privileges. In the latter case, the review may be limited to just the piece of code responsible for actual authentication (including the code preceding it).
Last but not least, nontechnical sources can also provide data useful for prioritization of code review. Code is typically created by teams of developers; teams usually change over time as new members join and others leave. Code also has its own security history, sometimes not only in the scope of a specific product. In most cases it can be assumed that newer code is of better quality than legacy code, but there can be exceptions. Talking with developers, reviewing documentation from previous reviews, or sampling code may provide very useful input. To achieve the best utilization of available resources, the security team should use any data that can help reviewers to remain effective throughout the process.
Code Reviewing Tactics
The challenges connected with code reviews of nontrivial applications result primarily from the size and complexity of software. Modern products are created using a variety of different technologies, programming models, coding styles, and code management policies. In practice, the complete and up-to-date documentation is rarely available. A reviewer usually faces much more information than can be processed by a human in an acceptable timeframe. To deal with information overload during the review of complex products, certain practices, referred to as tactics, have been developed. We'll discuss the two most common.
The first tactic, which we call contextual code analysis, is the natural consequence of prioritization and high-level analysis efforts. It relies on precise and context-aware analysis of data flows in areas of code directly involved with processing untrusted data that was passed across a security boundary. The advantage of such a targeted approach is the possibility of focusing on the code that is more likely to be attacked and thus achieve good coverage in the scope of the most critical code paths.
Contextual code analysis can be divided into two passes. The first one starts with an entry point and an understanding of the level of control an attacker has over input data. From this point, the developer conducts a data flow analysis aimed at tracking how variables and data structures change with execution of the program. The analysis is performed on a per-function basis, starting with input arguments, then moving through all execution paths, simultaneously monitoring changes in states of variables and structures.
Every function that is called and takes data controlled by an attacker (coming from an entry point and not yet validated) should be analyzed in the same way. Similarly, any propagation of data to global data structures should be flagged so that every reference to it can later be analyzed. At this point, navigation through the code is aimed at improving understanding of code, discovering its structure, and flagging places for further detailed investigations.
Obviously in the case of security code reviews, one needs to pay special attention to code areas that are more likely to have security problems. Examples of such places are listed in Figure 4.
Figure 4 High-Risk Code Paths for Review
User identification, authentication, and data protection
Authorization and access checks
Code that preprocesses untrusted data (network packet parsers, format readers)
Unsafe operations on buffers, strings, pointers, and dynamic memory
Validation layers ensuring untrusted data is in valid format
Code responsible for conversion of untrusted data to internal data structures
Logic involved with interpreting untrusted data
Places making assumptions about the data itself as well as behavior of its source
Code involved in handling error conditions
Usage of OS resources and network (files, registry, global objects, sockets)
Problematic areas typical to the environment in which the code executes
Usage of problematic API or violations of API contracts
If specific code doesn't process any data that could originate from an entry point or processes data that was already validated, it stops being relevant from a security point of view and the reviewer should move forward to other areas. At the end of the day, a reviewer should have a general understanding of the component's code as well as the list of locations flagged as potentially interesting places in the code. Using this list, the reviewer can begin the second pass of contextual analysis, which is focused on an in-depth investigation of selected areas of code.
We call the second code reviewing tactic pattern-focus code analysis. In this case, a reviewer starts at an arbitrary place in the code and looks for known types of potential security vulnerabilities. Although different supporting tools may be applied, "known" doesn't have to refer to any formal patterns, but rather to the reviewer's individual experience and intuition. For each vulnerability candidate, a reviewer follows up all code paths in order to determine whether the coding error actually represents a vulnerability—processing data that can be controlled by an attacker over a security boundary. If correct validation is identified at any level, the error should not be considered a security vulnerability, although it still may be identified as a defense-in-depth or non-security issue that requires a fix.
Pattern-focused code analysis can be used with limited understanding of an application, since a reviewer can focus on the quality and correctness of selected fragments of code rather than their role or location within a system. This allows for the coverage of a significant amount of code in the scope of specific patterns such as bad code constructs or problematic API calls. However, analyzing potential coding errors in isolation from the context of an entire application usually does not give enough information to determine whether it is a real problem. This may lead to overlooking serious bugs or fixing nonrelevant ones, posing additional long-term consequences to the application. If a bug is described only in local context, it may be tempting to introduce a local fix instead of a more complete validation at a higher level. Such an approach to fixing coding errors may result in redundant validations, increased chaos, decreased performance, and general problems with code management.
Types of Vulnerabilities
During security code reviews, try to maintain an attacker's point of view and look at everything that could be in the attacker's control. Conducting code reviews should not be limited to looking for dangerous APIs and data-copy operations. When it comes to security, the details are significant and history shows that the smallest flaws can be exploitable. This section contains examples of some types of coding errors, but there is no way to cover all of them here. The "Security Resources" sidebar will lead you to more comprehensive sources of information on likely vulnerabilities.
The general concept of security code vulnerabilities is often associated with buffer overflows. In the context of processing untrusted data, coding errors related to range and data type are still a very common source of security problems. This group of vulnerabilities is not, however, limited only to buffer overflows. The buffer overflow condition is connected with copying data to the stack, data segments, or the heap without concern as to their size, which may lead to writing beyond the defined range and hazardous situations resulting from potential overwrites of structures controlling program flow or other sensitive data.
Equally common and severe examples of coding errors related to range validation are out-of-buffer reads and writes. These specific problems often happen when using pointers received from a untrusted source (using pointers as cookies or handle values) or miscalculated indexes and offsets. They allow attackers to access arbitrary places of process memory such as data variables or function pointers and cause changes in program behavior including execution of arbitrary code.
In the scope of calculations, many transformations on numeric type variables (addition, subtraction, multiplication, and so on) may cause the calculation to leave its defined range and silently wrap, leading to integer overflows or underflows. This becomes a problem when the variable is used after transformation and at the same time somewhere else it is used either untransformed or transformed in a different way. As a result, existing checks may turn out to be insufficient, leading to buffer overflows or out-of-buffer reads and writes.
Other problems are related to the sign of numeric type variables. There are many abstract entities that a program deals with that have no defined meaning for negative values. For example, string byte lengths or array element counts do not really make sense as negative values—and using signed types to manipulate these quantities introduces a number of problematic cases.
Type casting (explicit or implicit) is another example of potentially unsafe transformation of data elements. Casting may make items bigger (in terms of bits) and hence may involve the propagation of signed bits into new values. Likewise, type casting could also cause the truncation of a value. Again, problems typically surface when the data item is used in a transformed state in one part of the program while used in an untransformed or differently transformed state in another.
There are specific security code vulnerabilities related to using dynamic memory. Program behavior with uninitialized data elements may be undefined. An attacker may be able to control some uninitialized variables by calling APIs to get or set the uninitialized variables prior to the API that initializes them. Returning uninitialized memory to an attacker can cause a number of problems. Since heap allocations come from a resource that may be shared by different threads servicing different security contexts (clients), sensitive information can leak from one client to the other. This may also provide an attacker with information useful in a broader exploitation of other security flaws, such as aiding the attacker in predicting addresses. Other problems may be related to double-freeing memory, using memory that was already freed, memory leaks, or memory exhaustion. The possible ramifications vary from denial of service and information disclosure to execution of malicious code.
A quite different group of security code vulnerabilities is related to synchronization and timing errors. Good examples are TOCTOU (time-of-check, time-of-use) race conditions. Security and sanitization checks are worthless if performed on data or resources that may be changed by an attacker after the check but before the actual use. Validation has to be performed on private copies of data that cannot change asynchronously. Resources need to be referenced properly to ensure they don't get deleted or replaced.
Multithreaded environments put especially strong requirements on code (services, kernel components) in terms of synchronization of access to shared objects and resources. Programmatic interfaces that allow attackers access to concurrent function calls or their asynchronous cancellation open a window for possible race conditions. Problems caused by missing locks, dropping locks and assuming preserved state, misuse of interlocked operations, or using a disjointed set of locks can lead to unsafe memory operations or deadlocks.
In the case of object lifetime management problems, if code makes a new object available to other threads by manipulating global data (publishing), it needs to be in a consistent (usually completely initialized) state. Proper managing of the lifetime of an object is determined by a proper reference counting mechanism. Broken object destruction and cleanup logic may give an attacker a way to unload code early (such as a plug-in) and free memory that is still in use.
Many security coding errors are not directly related to manipulations on untrusted data, but rather to its actual interpretation or its influence on program behavior and results. The classic examples are found in injection attacks that may occur when data submitted by an attacker is used to parameterize some other content such as a script, Web page, command line, or format string. By using special escape characters or control sequences, an attacker is able to force untrusted data to be interpreted as part of an executed script or a command. Problems arise also when untrusted data is used to construct names and paths to resources that are about to be created or used (files, sockets, registry, shared sections, global objects). Directory traversal or canonicalization issues make code vulnerable to all sorts of redirections that may result either in usage of resources that are under attackers' control or disclosure of sensitive information (for example, user secrets or network credentials).
Security vulnerabilities may also result from faulty assumptions about the data origin and destination (client id, network address, session id, or context handle), the order in which it is coming (network messages, API calls), and the amount (too much or too little data). Often, an attacker is able to control, to some extent, the environment in which code executes by creating named objects before the genuine product creates or uses them (name squatting attacks), filling up the disk space, or either blocking or redirecting network communication. Failure to handle these situations may make code vulnerable to elevation of privilege or denial-of-service attacks. Other common problems result from making assumptions about the technologies used by the code—either about their security guarantees (integrity of the communication channel) or about the inner workings of their APIs (providing incorrect combination of parameters and not checking return values instead of honoring API contracts).
Results from code reviews are often not limited to finding problems classified as code-level problems. Reviewers operate on the level of the actual implementation, and although they can benefit from using design-level documentation, including specifications and threat models, they are not supposed to rely on this data. Thus, code reviews still lead to identification of design-level issues or inconsistencies between the specification and implementation. The types of problems commonly found include exposure of dangerous functionality, incorrect implementation of protocol, usage of custom pseudo-security mechanisms (such as authentication schemes and access checks), or identification of ways to bypass security barriers.
A security code review cannot be considered successful if the security of a product is not improved. Code review should provide output that is useful for development teams to make meaningful changes to a product. Achieving this review-based improvement is connected with the two practical requirements of complete documentation and accurate triaging.
Documentation for each of the identified security code vulnerabilities should contain all details needed to locate and understand the issue. The details should include pointers to flawed code, an explanation of the problem, and justification for why this is a vulnerability. Adding recommendations for a fix is a useful practice, but selecting and preparing the actual solution is the responsibility of the code owners. If any data is missing or it is not clear why a coding error is a security vulnerability, additional research is required from the product team, which is not always possible due to resource or time constraints.
Another important requirement relates to accurate triaging of code vulnerabilities. If the severity of a problem is set too low, it may not be fixed by a product team. On the other hand, if severity is set too high, the problem may be selected for fixing instead of another problem with higher practical impact. The triaging process strongly depends on the quality of the security bug threshold, but also on an understanding of the priorities of code reviewers (who do the triaging) and developers (who act on the results of triaging). As we mentioned earlier, triaging security code errors should not be affected by the availability of exploitation prevention mechanisms.
Code reviewing efforts can provide more useful information than just a list of security problems. If possible, reviewers should also document code coverage, confidence in specific code areas, and general recommendations for code redesign and cleanup. Code reviews are also unique opportunities to enrich organizational knowledge, increase security awareness, and improve the effectiveness of security tools. Last, but definitely not least, development teams can use the results of code reviews to help prioritize future product security efforts.
We guarantee that software will always have security vulnerabilities, although the nature of those vulnerabilities and practical impact will change with time. Automated security tools are able to identify more and more coding errors, but some vulnerabilities will still be missed (either not detected or hidden under large numbers of false positives). Manual source code analysis is not a replacement for these tried-and-true tools, but it can often be advantageously integrated with them.
The manual code review approach is expensive, difficult, and highly dependent on the experience and commitment of the participants. However, in many situations the project requires this investment in order to obtain acceptable confidence about the security of a product or its critical components. An experienced human reviewer is still capable of identifying issues that would be missed by tools. As long as a human can be the cause of security problems, a human should also be a part of the solution.