Skip to main content

Tools Included

  • ldb: Linked-list database for storing Knowledge Base (KB) data
  • minr: Mining tool for downloading and indexing open-source software (OSS) components
  • SCANOSS: Scanning engine for querying the KB

Quick Start

1. Build the Docker Image

docker build -t scanoss-stack .

2. Start the Container

# sleep infinity keeps the container running so subsequent commands can be executed against it
docker run -d --name scanoss scanoss-stack sleep infinity

3. Create a Knowledge Base

# Download and index an OSS component into the mined/ directory
docker exec scanoss minr -d scanoss,webhook,1.0,20200320,BSD-3-Clause,pkg:github/scanoss/webhook \
     -u https://github.com/scanoss/webhook/archive/1.0.tar.gz

# Extract snippet fingerprints from the mined/ directory
docker exec scanoss minr -z mined

# Create the version file required by the minr import step
docker exec scanoss bash -c "echo '{\"monthly\":\"25.01\", \"daily\":\"25.01.12\"}' > mined/version.json"

# Import the mined data into the KB
docker exec scanoss minr -i mined/

4. Download Test Files

Download and extract a file from the same archive to use as a scan target:
docker exec scanoss curl -sL https://github.com/scanoss/webhook/archive/1.0.tar.gz -o /tmp/test.tar.gz
docker exec scanoss tar -xzf /tmp/test.tar.gz -C /tmp

5. Scan the Original File

Scanning an unmodified file against the KB should produce a 100% file match:
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq
Expected output:
{
  "github.py": [{
    "id": "file",
    "matched": "100%",
    "purl": ["pkg:github/scanoss/webhook"],
    ...
  }]
}

6. Modify the File

docker exec scanoss bash -c "echo '# modified by user' >> /tmp/webhook-1.0/scanoss/github.py"

7. Scan the Modified File

Scanning a modified file should produce a snippet match, reflecting partial code reuse:
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq
Expected output:
{
  "github.py": [{
    "id": "snippet",
    "matched": "97%",
    "lines": "1-192",
    "oss_lines": "3-194",
    "purl": ["pkg:github/scanoss/webhook"],
    ...
  }]
}

8. Cleanup

docker stop scanoss && docker rm scanoss

End-to-End Example

# Build and start
docker build -t scanoss-stack .
docker run -d --name scanoss scanoss-stack sleep infinity

# Create Knowledge Base
docker exec scanoss minr -d scanoss,webhook,1.0,20200320,BSD-3-Clause,pkg:github/scanoss/webhook \
     -u https://github.com/scanoss/webhook/archive/1.0.tar.gz
docker exec scanoss minr -z mined
docker exec scanoss bash -c "echo '{\"monthly\":\"25.01\", \"daily\":\"25.01.12\"}' > mined/version.json"
docker exec scanoss minr -i mined/

# Download test files
docker exec scanoss curl -sL https://github.com/scanoss/webhook/archive/1.0.tar.gz -o /tmp/test.tar.gz
docker exec scanoss tar -xzf /tmp/test.tar.gz -C /tmp

# Test file matching (100%)
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq

# Modify file
docker exec scanoss bash -c "echo '# modified' >> /tmp/webhook-1.0/scanoss/github.py"

# Test snippet matching
docker exec scanoss scanoss /tmp/webhook-1.0/scanoss/github.py | jq

# Cleanup
docker stop scanoss && docker rm scanoss

Scanning Your Own Files

Copy Files into the Container

To scan files from your host machine, copy them into the container first:
# Copy a file into the container
docker cp /path/to/your/file.py scanoss:/tmp/file.py

# Scan it
docker exec scanoss scanoss /tmp/file.py

Mount a Volume

To scan a directory of files without copying them individually, mount a volume when starting the container:
docker run -d --name scanoss -v $(pwd)/mycode:/code scanoss-stack sleep infinity
docker exec scanoss scanoss /code/myfile.py