Code PathFinder - Open Source CodeQL Alternative
What is Code PathFinder?
Code PathFinder is a code analysis tool that helps you find exact code patterns and paths in your codebase. While there are several ways to grep source code, having source code broken down into individual entities, building graphs & edges which help in establishing relationships between entities, imitates the way a human reads code.
How do security engineers interact with codebases today?
If you think about how engineers generally interact with a codebase, it typically follows this process:
- Start by searching for a symbol
- Resolve the symbol to an entity such as a class or function
- Find the entity’s definition
- Find the entity’s references across the codebase and often across multiple repositories
- Determine the flow of the code
- 5A. Have a source in mind such as user inputs, database, or a file or even network operations
- 5B. Have a sink in mind such as the above symbol’s definition
- 5C. Determine the flow of the code including method jumps, method calls, and method returns
- 5D. Identify if there are any blockers in between such as conditions, loops, etc.
- Identify the variables that are modified and the variables that are used within the flow
Representing this process technically as a graph can be more useful in finding the flow of the code. Moreover, the relationships as edges between entities can be used as conditions to focus on the paths that are relevant to the source and sink.
For example, to find a code pattern where the Socket
class is instantiated and the send
method is called on it, and to get all enclosing methods, you could use:
The above query will return all the enclosing methods of the send
method in the Socket
class and invoked calls to the send
method.
The entities such as MethodInvocation
, MethodDeclaration
, and ClassInstanceExpr
are called entities and are represented as nodes in the graph.
The edges between the nodes represent relationships between the entities.
How does Code PathFinder work?
Code Pathfinder uses tree-sitter to parse the source code and build a graph of the code. The graph is then used to find answers to queries. Similar to SQL, Code Pathfinder uses a query language to filter and apply conditions to the graph nodes logically. Sometimes, it generates a cartesian product of the graph nodes to retrieve all possible combinations and applies the conditions in order to find the paths in code. While there are still many APIs yet to be implemented and it lacks support for classes and inheritance, Code Pathfinder is currently equipped with the following features:
- Predicates
- Complex conditions
- Aliases
Getting Started
Code Pathfinder is a command-line tool that can be installed using the following command:
or you can download pre-built binaries from the releases page.
Contributing to Code Pathfinder OSS
If you are interested in contributing to Code Pathfinder, please check out the Code Pathfinder repository. Give it a try and file an issue if you find any bugs or have any suggestions.