Documentation Index
Fetch the complete documentation index at: https://docs.scanoss.com/llms.txt
Use this file to discover all available pages before exploring further.
- ldb: Linked-list database for storing Knowledge Base (KB) data
- minr: Mining tool for downloading and indexing open-source software (OSS) components
- SCANOSS: Scanning engine for querying the KB
Quick Start
1. Build the Docker Image
docker build -t scanoss-stack .
2. Start the Container
# sleep infinity keeps the container running so subsequent commands can be executed against it
docker run -d --name scanoss scanoss-stack sleep infinity
3. Create a Knowledge Base
# Download and index an OSS component into the mined/ directory
docker exec scanoss minr -d scanoss,webhook,1.0,20200320,BSD-3-Clause,pkg:github/scanoss/webhook \
-u https://github.com/scanoss/webhook/archive/1.0.tar.gz
# Extract snippet fingerprints from the mined/ directory
docker exec scanoss minr -z mined
# Create the version file required by the minr import step
docker exec scanoss bash -c "echo '{\"monthly\":\"25.01\", \"daily\":\"25.01.12\"}' > mined/version.json"
# Import the mined data into the KB
docker exec scanoss minr -i mined/
4. Download Test Files
Download and extract a file from the same archive to use as a scan target:
docker exec scanoss curl -sL https://github.com/scanoss/webhook/archive/1.0.tar.gz -o /tmp/test.tar.gz
docker exec scanoss tar -xzf /tmp/test.tar.gz -C /tmp
5. Scan the Original File
Scanning an unmodified file against the KB should produce a 100% file match:
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq
Expected output:
{
"github.py": [{
"id": "file",
"matched": "100%",
"purl": ["pkg:github/scanoss/webhook"],
...
}]
}
6. Modify the File
docker exec scanoss bash -c "echo '# modified by user' >> /tmp/webhook-1.0/scanoss/github.py"
7. Scan the Modified File
Scanning a modified file should produce a snippet match, reflecting partial code reuse:
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq
Expected output:
{
"github.py": [{
"id": "snippet",
"matched": "97%",
"lines": "1-192",
"oss_lines": "3-194",
"purl": ["pkg:github/scanoss/webhook"],
...
}]
}
8. Cleanup
docker stop scanoss && docker rm scanoss
End-to-End Example
# Build and start
docker build -t scanoss-stack .
docker run -d --name scanoss scanoss-stack sleep infinity
# Create Knowledge Base
docker exec scanoss minr -d scanoss,webhook,1.0,20200320,BSD-3-Clause,pkg:github/scanoss/webhook \
-u https://github.com/scanoss/webhook/archive/1.0.tar.gz
docker exec scanoss minr -z mined
docker exec scanoss bash -c "echo '{\"monthly\":\"25.01\", \"daily\":\"25.01.12\"}' > mined/version.json"
docker exec scanoss minr -i mined/
# Download test files
docker exec scanoss curl -sL https://github.com/scanoss/webhook/archive/1.0.tar.gz -o /tmp/test.tar.gz
docker exec scanoss tar -xzf /tmp/test.tar.gz -C /tmp
# Test file matching (100%)
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq
# Modify file
docker exec scanoss bash -c "echo '# modified' >> /tmp/webhook-1.0/scanoss/github.py"
# Test snippet matching
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq
# Cleanup
docker stop scanoss && docker rm scanoss
Scanning Your Own Files
Copy Files into the Container
To scan files from your host machine, copy them into the container first:
# Copy a file into the container
docker cp /path/to/your/file.py scanoss:/tmp/file.py
# Scan it
docker exec scanoss scanoss /tmp/file.py
Mount a Volume
To scan a directory of files without copying them individually, mount a volume when starting the container:
docker run -d --name scanoss -v $(pwd)/mycode:/code scanoss-stack sleep infinity
docker exec scanoss scanoss /code/myfile.py