UNPKG

@appthreat/atom

Version:

Create atom (⚛) representation for your application, packages and libraries

166 lines (134 loc) 11.8 kB
# Atom npm wrapper Atom is a novel intermediate representation for applications and a standalone tool powered by the [chen](https://github.com/AppThreat/chen) library. The intermediate representation (a network with nodes and links) is optimised for operations typically used for application analytics and machine learning, including [slicing](./specification/docs/slices.md) and vectoring. This package wraps the atom Java distributable and makes it available via the npm package registry. Ensure Java 21 is available in the PATH. ## Usage ```shell npm install @appthreat/atom ``` For a broader language support, also install `@appthreat/atom-parsetools`. ```shell npm install -g @appthreat/atom-parsetools ``` ## Languages supported - C/C++ - H (C/C++ Header and pre-processed .i files alone) - Java (Requires compilation) - Jar - Android APK (Requires Android SDK. Set the environment variable `ANDROID_HOME` or use the container image.) - JavaScript - Flow - TypeScript - Python (Supports 3.x to 3.13) - PHP (Requires PHP >= 7.4. Supports PHP 7.0 to 8.4 with limited support for PHP 5.x) - Ruby (Requires Ruby 4.0.x. Supports Ruby 1.8 - 4.0.x syntax) - Scala (WIP) ## CLI Usage ``` Usage: atom [parsedeps|data-flow|usages|reachables] [options] [input] input source file or directory -o, --output <value> output filename. Default app.⚛ or app.atom in windows -s, --slice-outfile <value> export intra-procedural slices as json -l, --language <value> source language --with-data-deps generate the atom with data-dependencies - defaults to `false` --remove-atom do not persist the atom file - defaults to `false` --reuse-atom reuse existing atom file - defaults to `false` -x, --export-atom export the atom file with data-dependencies to graphml - defaults to `false` --export-dir <value> export directory. Default: atom-exports --file-filter <value> the name of the source file to generate slices from. Uses regex. --method-name-filter <value> filters in slices that go through specific methods by names. Uses regex. --method-parameter-filter <value> filters in slices that go through methods with specific types on the method parameters. Uses regex. --method-annotation-filter <value> filters in slices that go through methods with specific annotations on the methods. Uses regex. --max-num-def <value> maximum number of definitions in per-method data flow calculation - defaults to 2000 Command: parsedeps Extract dependencies from the build file and imports Command: data-flow [options] Extract backward data-flow slices --slice-depth <value> the max depth to traverse the DDG for the data-flow slice - defaults to 7. --sink-filter <value> filters on the sink's `code` property. Uses regex. Command: usages [options] Extract local variable and parameter usages --min-num-calls <value> the minimum number of calls required for a usage slice - defaults to 1. --include-source includes method source code in the slices - defaults to false. --extract-endpoints extract http endpoints and convert to openapi format using atom-tools - defaults to false. Command: reachables [options] Extract reachable data-flow slices based on automated framework tags --source-tag <value> source tag - defaults to framework-input. Comma-separated values allowed. --sink-tag <value> sink tag - defaults to framework-output. Comma-separated values allowed. --include-crypto includes crypto library flows - defaults to false. --help display this help message ``` ## Sample Invocations ### Generate an atom ```shell # Compile java project atom -o app.atom -l java . ``` ```shell atom -o app.atom -l jar <jar file> ``` ```shell export ANDROID_HOME=<path to android sdk> atom -o app.atom -l apk <apk file> ``` ### Create reachables slice for a java project. ```shell cd <path to repo> cdxgen -t java --deep -o bom.json . atom reachables -o app.atom -s reachables.json -l java . ``` Pass the argument `--reuse-atom` to slice based on an existing atom file. ```shell atom reachables --reuse-atom -o app.atom -s reachables.json -l java . ``` ## Environment variables | Variable | Description | | --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | **CHEN_IGNORE_DIRS** | Comma-separated list of directories to ignore for every language frontend. | | **CHEN_IGNORE_TEST_DIRS** | Set to true to ignore common test directories (`test`, `tests`, `mocks`) for every language frontend. | | **CHEN_C_IGNORE_DIRS** | Comma-separated list of additional directories to ignore for the C/C++ and header frontends. | | **CHEN_CPP_IGNORE_DIRS** | Comma-separated list of additional directories to ignore for the C++ frontend. | | **CHEN_JAVA_IGNORE_DIRS** | Comma-separated list of additional directories to ignore for the Java source frontend. | | **CHEN_JIMPLE_IGNORE_DIRS** | Comma-separated list of directories to ignore for the Jimple/JAR/Android/APK/DEX frontend. | | **CHEN_SCALA_IGNORE_DIRS** | Comma-separated list of directories to ignore for the Scala frontend. | | **CHEN_JAVASCRIPT_IGNORE_DIRS** | Comma-separated list of directories to ignore for the JavaScript, TypeScript, and Flow frontend. | | **CHEN_JS_IGNORE_DIRS** | Alias for JavaScript, TypeScript, and Flow ignored directories. | | **CHEN_TYPESCRIPT_IGNORE_DIRS** | Alias for TypeScript and Flow ignored directories. | | **CHEN_PYTHON_IGNORE_DIRS** | Comma-separated list of directories to ignore for Python. If unset, Atom uses Python's default ignored directories. | | **CHEN_PHP_IGNORE_DIRS** | Comma-separated list of additional directories to ignore for the PHP frontend. | | **CHEN_RUBY_IGNORE_DIRS** | Comma-separated list of additional directories to ignore for the Ruby frontend. | | **CHEN_DELOMBOK_MODE** | Delombok mode for the Java frontend (`no-delombok`, `default`, `types-only`, `run-delombok`). | | **CHEN_INCLUDE_PATH** | Include directories for the C frontend. Separate paths with `:` or `;`. | | **CHEN_ASTGEN_OUT** | Existing astgen output directory. Improves performance for JavaScript, TypeScript, and Flow during repeated invocations by reusing existing AST json data. | | **ATOM_TOOLS_OPENAPI_FORMAT** | OpenAPI format for atom-tools. Default: `openapi3.1.0`; alternative: `openapi3.0.1`. | | **ATOM_TOOLS_WORK_DIR** | Working directory for atom-tools. Defaults to atom input path. | | **ATOM_SCALASEM_WORK_DIR** | Working directory for scalasem. Defaults to atom input path. | | **ATOM_SCALASEM_SLICES_FILE** | Slices file name. Defaults to `semantics.slices.json`. | | **ATOM_JVM_ARGS** | Overrides the JVM arguments, including heap memory values, constructed by the atom Node.js wrapper. | | **ATOM_JAVA_HOME** | Java 21 or above to be used by atom. | | **PHP_CMD** | Overrides the PHP command used by the PHP frontend. | | **PHP_PARSER_BIN** | Overrides the php-parse command used by the PHP frontend. | | **SCALA_CMD** | Overrides the scala command. | | **SCALAC_CMD** | Overrides the scalac command used by the scala frontend. | | **ASTGEN_IGNORE_DIRS** | Comma-separated list of directories to ignore by the JavaScript astgen pre-processor command. | | **ASTGEN_IGNORE_FILE_PATTERN** | File pattern to ignore by the JavaScript astgen pre-processor command. | | **ASTGEN_INCLUDE_NODE_MODULES_BUNDLES** | Also include source code from node_modules directory. Makes the flows more complete at the cost of increased memory use. | | **JAVA_CMD** | Overrides the java command. | | **RUBY_CMD** | Overrides the Ruby command. | ## Troubleshooting ### atom file is incomplete for large projects astgen might require a generous heap of memory for large JavaScript projects, especially flow projects. Use the environment variable `NODE_OPTIONS` to increase the memory available. ```bash export NODE_OPTIONS="--expose-gc --max-old-space-size=16288" ``` For large projects such as React 19, astgen requires over 80 GB of heap memory! Use the environment variable `CHEN_ASTGEN_OUT` to make atom and chen, reuse any existing directory containing astgen generated json and typemap files. To improve the accuracy further, include source code from the `node_modules` directory by setting `ASTGEN_INCLUDE_NODE_MODULES_BUNDLES`. ```bash export ASTGEN_INCLUDE_NODE_MODULES_BUNDLES=true export ASTGEN_IGNORE_DIRS="" ``` ## License MIT