Commit d594cc47 authored by Carsten Emde's avatar Carsten Emde
Browse files

Fixed some bugs that prevented scanning an entire root filesystem,

expanded README.md and added example graphics.
parent 0fc3da98
......@@ -34,52 +34,120 @@ Copyright 2018-2019 - Armijn Hemel
Copyright 2021 - Open Source Automation Development Lab (OSADL) eG, author Carsten Emde
# Command line options
# Command line syntax
* usage: generatecypher.py [-h] [-c FILE] [-d DIR] [-f] [-o FORMAT] [-s DIR] [-t FILE] [-v]
* Usage
```shell
generatecypher.py [-h] [-c FILE] [-d DIR] [-f] [-o FORMAT] [-s DIR] [-t FILE] [-v]
```
* optional arguments:
* Explanations
```shell
-h, --help show this help message and exit
-c FILE, --config FILE
path to configuration file
path to configuration file (required)
-d DIR, --directory DIR
path to directory to scan
path to directory to scan (required)
-f, --flat do not recurse through directories to scan
-o FORMAT, --outputformat FORMAT
output format 'cypher', 'gexf', 'gv' or 'text', default 'gv'
-p FONT, --fontname FONT
name of the font to be used throughout the document ('gv' only)
-s DIR, --skipdirs DIR
exclude directories from being scanned
-t FILE, --targets FILE
only examine file or comma-separated list of files
-x, --symbols include symbols and their relations (default when format is 'cypher')
-v, --verbose be more verbose about what the program is actually doing
```
# Selection of individual targets and directories to scan
If no target is specified using the -t option, all dicovered binaries below the
given scan directory will be considered. For every incompatible ELF data set
(e.g. different endianness) a separate graph will be created.
If a target is specified using the -t option, only files that depend on this
target and have the same ELF data set, will be included in the scan.
If several targets are specified, the ELF data set of the first target is taken
as reference and subsequent targets are exluded, if they do not match this ELF
data set.
The targets must be specified relative to the scan directory specified using the
-d command line switch.
# Examples
1. Scan a single file from the root file system of an embedded system and
specify two directories to ignore using the -s command line option. This is the
usual way to conduct a callgraph scan. The two graphical outputs of this example
are provided in the 'graphics' directory of the repository. The SVG version is
displayed below.
```shell
./generatecypher.py -d rootfs -s rootfs/usr,rootfs/lib/modules -c graph.config -t bin/bash.bash
dot -Tsvg /raid/src/callgraph/gvdir/*.gv >gv.svg
dot -Tpdf /raid/src/callgraph/gvdir/*.gv >gv.pdf
```
![Graphical output in SVG format](/graphics/gv.svg)
2. Scan the entire root file system of an embedded system. This may take a long
time depending on the size of the root file system, and the output may become
too busy to be useful. However, if text output is selected, individual files of
interest may be searched in the output and analyzed, and as long as the root
file system is not modified the scan output can be reused.
```shell
./generatecypher.py -d rootfs -f text -c graph.config
grep bash textdir/*.text
/bin/bash.bash LINKSWITH /lib/libtinfo.so.5.9
/bin/bash.bash LINKSWITH /lib/libdl.so.2
/bin/bash.bash LINKSWITH /lib/libc.so.6
grep ^/lib/libtinfo.so.5.9 textdir/*.text
/lib/libtinfo.so.5.9 LINKSWITH /lib/libc.so.6
grep ^/lib/libdl.so.2 textdir/*.text
/lib/libdl.so.2 LINKSWITH /lib/libc.so.6
/lib/libdl.so.2 LINKSWITH /lib/ld-2.28.so
grep ^/lib/libc.so.6 textdir/*.text
/lib/libc.so.6 LINKSWITH /lib/ld-2.28.so
grep ^/lib/ld-2.28.so textdir/tmp81o7pvp8.text
```
3. Scan a single file of the host root file system. This may also take a very
long time depending on the size of the root file system and whether it was
possible to exclude irrelevant directories using the -s command.
```shell
./generatecypher.py -d / -c graph.config -t /bin/bash
```
4. Scan the entire host root file system. This normally exceeds by far the
capabilities of this callgraph tool (and probably also of the graphics
converters) and is not recommended.
```shell
./generatecypher.py -d / -c graph.config
```
# Limitation
This callgraph generator only considers ELF files. High-level language function
calls such as using external shell functions, including objects of external Java
classes or similar methods of code reuse in Python and PHP cannot be analyzed.
# Scope
Only when output format is 'cypher' all symbols with related exporters and users
are included in the output by default, in all other output formats this must explicitly
be configured using the '-x' option. Heavily linked programs with a large number of
unresolved symbols may take too long to be converted into a graph or, when finally
succeeded to draw, the graph is too busy to be used.
are included in the output by default, in all other output formats this must
explicitly be configured using the '-x' option. Heavily linked programs with a
large number of unresolved symbols may take too long to be converted into a
graph or, when finally succeeded to draw, the graph is too busy to be used.
# Getting Gephi (tested with version 0.9.2)
Get Gephi from
https://gephi.org/users/download/
and follow the installation instructions
Get Gephi from https://gephi.org/users/download/ and follow the installation
instructions.
# Getting Graphviz (tested with version 2.42.4)
......@@ -87,48 +155,42 @@ Graphviz in included in nearly all popular Linux distributions. The recommended
binary is 'dot'; it must be executed in a subsequent step to convert the
callgraph output into one of the supported display formats such as PDF or SVG,
e.g.
```shell
dot -Tpdf callgraph-output.gv >gv-display.pdf
dot -Tsvg callgraph-output.gv >gv-display.svg
```
In addition, it is possible to select different font name and size using command
line options, e.g.
```shell
dot -Nfontname=Korolev -Nfontsize=16 -Tpdf callgraph-output.gv >/tmp/gv-display.pdf
There is also a Graphviz live visual editor
```
There is also a Graphviz live visual editor.
# Getting the Graphviz visual editor (tested with version 0.6.4+)
Get the Graphviz visual editor from
https://github.com/magjac/graphviz-visual-editor
and follow the installations instructions
1. git clone https://github.com/magjac/graphviz-visual-editor
2. cd graphviz-visual-editor
3. npm install
4. make
5. npm run start
https://github.com/magjac/graphviz-visual-editor and follow the installations
instructions
```shell
git clone https://github.com/magjac/graphviz-visual-editor
cd graphviz-visual-editor
npm install
make
npm run start
```
You may then access the Graphviz visual editor by entering
https://localhost:3000 into your browser of choice.
https://localhost:3000
into your browser of choice.
# Getting Neo4J (tested with version 3.4.9 community edition)
# Neo4J
Get the community edition at:
## Getting Neo4J (tested with version 3.4.9 community edition)
https://neo4j.com/download-center/
Get the community edition at https://neo4j.com/download-center/.
Since Neo4J tends to shuffle these download links around every once in a while
it might not be accurate at some point in time.
# Usage
## Usage
1. start and configure Neo4J (out of scope of this document)
2. unpack a root file system of a firmware into a directory (example: /tmp/rootfs)
......@@ -136,7 +198,7 @@ it might not be accurate at some point in time.
4. run the script: `python3 generatecypher.py -c /path/to/config -d /path/to/directory`
5. load the resulting Cypher file into Neo4J
# Example
## Example
(picture for this example can be found in the directory "pics")
......@@ -186,33 +248,32 @@ To select all files that link with a certain library (figure 6):
On Unix(-like) systems such as Linux executables are typicaly in the ELF
executable format. On most systems the executables are dynamically linked,
meaning that dependencies are only resolved and loaded at run time, instead
of at build time. Some open source licenses explicitly mention dynamic linking
(for example LGPL 2.1, section 6b) which makes it important to know which
files link with eachother.
meaning that dependencies are only resolved and loaded at run time, instead of
at build time. Some open source licenses explicitly mention dynamic linking (for
example LGPL 2.1, section 6b) which makes it important to know which files link
with eachother.
Looking at a single file is therefore not enough. Even looking at the direct
dependencies is not sufficient but the whole linking graph has to be looked
at to find out what the (likely) run time dependencies are.
dependencies is not sufficient but the whole linking graph has to be looked at
to find out what the (likely) run time dependencies are.
ELF files record several bits of useful information:
1. a list of symbols (function names, variable names) that are needed at
runtime
1. a list of symbols (function names, variable names) that are needed at runtime
2. a list of symbols (function names, variable names) that are exported/made
available
3. a list of file names of other ELF files (or symbolic links to other ELF
files) in which the symbols can possibly be found
During run time the so called "dynamic linker" sees if the ELF files from
step 3 can be found in its search path. If so it extracts the symbols from
these files (step 2) and matches them with the symbols from step 1. It is
possible to have two libraries with the same name but in different paths. Which
library is chosen depends on the configuration of the dynamic linker.
During run time the so called "dynamic linker" sees if the ELF files from step 3
can be found in its search path. If so it extracts the symbols from these files
(step 2) and matches them with the symbols from step 1. It is possible to have
two libraries with the same name but in different paths. Which library is chosen
depends on the configuration of the dynamic linker.
Sometimes some search paths are hardcoded to a specific ELF file using the
so called "RPATH", which makes it possible to somewhat limit from which
libraries symbols are chosen.
Sometimes some search paths are hardcoded to a specific ELF file using the so
called "RPATH", which makes it possible to somewhat limit from which libraries
symbols are chosen.
The scripts here do something similar to the dynamic linker, but instead of
running the program graphs are created for displaying and searching.
......@@ -54,7 +54,7 @@ import elftools.elf.dynamic
import elftools.elf.sections
def notarget(filename, limitsearch):
if limitsearch is None:
if len(limitsearch) == 0:
return False
filefound = False
for file in limitsearch:
......@@ -149,16 +149,16 @@ def createoutput(outputdir, outputformat, machine_to_binary, linked_libraries,
newline = "\n"
for architecture in machine_to_binary:
if architecture != target_architecture:
if target_architecture != "" and architecture != target_architecture:
continue
for o in machine_to_binary[architecture]:
if o != target_machine:
if target_machine != "" and o != target_machine:
continue
for endian in machine_to_binary[architecture][o]:
if endian != target_endian:
if target_endian != "" and endian != target_endian:
continue
for elfclass in machine_to_binary[architecture][o][endian]:
if elfclass != target_elfclass:
if target_elfclass != "" and elfclass != target_elfclass:
continue
elf_to_placeholder = {}
placeholder_to_elf = {}
......@@ -204,7 +204,11 @@ def createoutput(outputdir, outputformat, machine_to_binary, linked_libraries,
outputfileopen.write("<graph defaultedgetype=\"directed\" idtype=\"string\" type=\"static\">\n")
outputfileopen.write("<nodes>\n")
elif outputformat == 'gv':
outputfileopen.write("digraph " + os.path.basename(limitsearch[0]).replace('.', '_') + " {\n ratio=0.562;\n")
if len(limitsearch) == 0:
digraphname = "full_directory_scan"
else:
digraphname = os.path.basename(limitsearch[0]).replace('.', '_');
outputfileopen.write("digraph " + digraphname + " {\n ratio=0.562;\n")
if fontname is not None:
outputfileopen.write(" graph [fontname=\"" + fontname + "\"];\n")
outputfileopen.write(" node [fontname=\"" + fontname + "\"];\n")
......
File added
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.42.4 (0)
-->
<!-- Title: bash_bash Pages: 1 -->
<svg width="457pt" height="260pt"
viewBox="0.00 0.00 457.49 260.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 256)">
<title>bash_bash</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-256 453.49,-256 453.49,4 -4,4"/>
<!-- kAMqjuFa -->
<g id="node1" class="node">
<title>kAMqjuFa</title>
<g id="a_node1"><a xlink:title="/lib/ld&#45;2.28.so">
<ellipse fill="none" stroke="black" cx="134.76" cy="-18" rx="46.29" ry="18"/>
<text text-anchor="middle" x="134.76" y="-14.3" font-family="Times,serif" font-size="14.00">ld&#45;2.28.so</text>
</a>
</g>
</g>
<!-- zJZdGDHc -->
<g id="node2" class="node">
<title>zJZdGDHc</title>
<g id="a_node2"><a xlink:title="/lib/libdl.so.2">
<ellipse fill="none" stroke="black" cx="74.76" cy="-162" rx="44.39" ry="18"/>
<text text-anchor="middle" x="74.76" y="-158.3" font-family="Times,serif" font-size="14.00">libdl.so.2</text>
</a>
</g>
</g>
<!-- zJZdGDHc&#45;&gt;kAMqjuFa -->
<g id="edge2" class="edge">
<title>zJZdGDHc&#45;&gt;kAMqjuFa</title>
<path fill="none" stroke="black" d="M81.88,-144.15C92.21,-119.71 111.45,-74.17 123.66,-45.29"/>
<polygon fill="black" stroke="black" points="126.99,-46.4 127.66,-35.82 120.54,-43.67 126.99,-46.4"/>
</g>
<!-- TsRNJmPF -->
<g id="node4" class="node">
<title>TsRNJmPF</title>
<g id="a_node4"><a xlink:title="/lib/libc.so.6">
<ellipse fill="none" stroke="black" cx="195.76" cy="-90" rx="40.89" ry="18"/>
<text text-anchor="middle" x="195.76" y="-86.3" font-family="Times,serif" font-size="14.00">libc.so.6</text>
</a>
</g>
</g>
<!-- zJZdGDHc&#45;&gt;TsRNJmPF -->
<g id="edge1" class="edge">
<title>zJZdGDHc&#45;&gt;TsRNJmPF</title>
<path fill="none" stroke="black" d="M99.25,-146.83C117.58,-136.23 142.83,-121.62 162.91,-110.01"/>
<polygon fill="black" stroke="black" points="164.82,-112.94 171.73,-104.91 161.32,-106.88 164.82,-112.94"/>
</g>
<!-- OpxaXgaC -->
<g id="node3" class="node">
<title>OpxaXgaC</title>
<g id="a_node3"><a xlink:title="/bin/bash.bash">
<ellipse fill="none" stroke="black" cx="197.76" cy="-234" rx="45.49" ry="18"/>
<text text-anchor="middle" x="197.76" y="-230.3" font-family="Times,serif" font-size="14.00">bash.bash</text>
</a>
</g>
</g>
<!-- OpxaXgaC&#45;&gt;zJZdGDHc -->
<g id="edge4" class="edge">
<title>OpxaXgaC&#45;&gt;zJZdGDHc</title>
<path fill="none" stroke="black" d="M172.87,-218.83C154.37,-208.3 128.91,-193.81 108.56,-182.24"/>
<polygon fill="black" stroke="black" points="110.04,-179.05 99.62,-177.14 106.58,-185.13 110.04,-179.05"/>
</g>
<!-- OpxaXgaC&#45;&gt;TsRNJmPF -->
<g id="edge5" class="edge">
<title>OpxaXgaC&#45;&gt;TsRNJmPF</title>
<path fill="none" stroke="black" d="M197.52,-215.87C197.18,-191.67 196.56,-147.21 196.15,-118.39"/>
<polygon fill="black" stroke="black" points="199.65,-118.14 196.01,-108.19 192.65,-118.24 199.65,-118.14"/>
</g>
<!-- INTwJlrq -->
<g id="node5" class="node">
<title>INTwJlrq</title>
<g id="a_node5"><a xlink:title="/lib/libtinfo.so.5.9">
<ellipse fill="none" stroke="black" cx="346.76" cy="-162" rx="60.39" ry="18"/>
<text text-anchor="middle" x="346.76" y="-158.3" font-family="Times,serif" font-size="14.00">libtinfo.so.5.9</text>
</a>
</g>
</g>
<!-- OpxaXgaC&#45;&gt;INTwJlrq -->
<g id="edge3" class="edge">
<title>OpxaXgaC&#45;&gt;INTwJlrq</title>
<path fill="none" stroke="black" d="M226.14,-219.67C248.9,-208.98 281.13,-193.83 306.55,-181.89"/>
<polygon fill="black" stroke="black" points="308.36,-184.91 315.93,-177.49 305.39,-178.57 308.36,-184.91"/>
</g>
<!-- TsRNJmPF&#45;&gt;kAMqjuFa -->
<g id="edge6" class="edge">
<title>TsRNJmPF&#45;&gt;kAMqjuFa</title>
<path fill="none" stroke="black" d="M181.62,-72.76C173.84,-63.84 164.05,-52.61 155.43,-42.72"/>
<polygon fill="black" stroke="black" points="158.04,-40.39 148.84,-35.15 152.77,-44.99 158.04,-40.39"/>
</g>
<!-- INTwJlrq&#45;&gt;TsRNJmPF -->
<g id="edge7" class="edge">
<title>INTwJlrq&#45;&gt;TsRNJmPF</title>
<path fill="none" stroke="black" d="M315.48,-146.5C291.27,-135.28 257.71,-119.71 232.27,-107.92"/>
<polygon fill="black" stroke="black" points="233.51,-104.64 222.96,-103.61 230.56,-110.99 233.51,-104.64"/>
</g>
</g>
</svg>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment