Skip to main content

FolderHashScan

Scan folder structures to identify software components using hierarchical proximity hashing. The scan evaluates filenames, directory structure, and file content to generate similarity scores against the SCANOSS knowledge base.

Request Format

The request defines a hierarchical folder tree:
  • root: The root folder node containing the structure to scan
    • path_id: Folder identifier (can be obfuscated)
    • sim_hash_names: Hash derived from filenames
    • sim_hash_content: Hash derived from file contents
    • sim_hash_dir_names: Hash derived from directory structure
    • lang_extensions: File type distribution (e.g., {“py”: 15, “js”: 8})
    • children: Nested folder nodes following the same structure
Optional filtering parameters:
  • rank_threshold: Filter by ecosystem (e.g. github, npm, maven)
  • category: Filter by ecosystem (e.g. github, npm, maven)
  • query_limit: Max results per folder
  • recursive_threshold: Minimum score for recursive matches
  • min_accepted_score: Minimum similarity score (0–1)

HTTP Request Example

```bash
curl -X POST 'https://api.scanoss.com/v2/scanning/hfh/scan' \
  -H 'Content-Type: application/json' \
  -H "X-Api-Key: $SC_API_KEY" \
  -d '{
    "root": {
      "path_id": "src",
      "sim_hash_names": "abc123def456",
      "sim_hash_content": "789ghi012jkl",
      "sim_hash_dir_names": "345mno678pqr",
      "lang_extensions": {
        "py": 25,
        "md": 3
      },
      "children": [
        {
          "path_id": "src/utils",
          "sim_hash_names": "def456ghi789",
          "sim_hash_content": "012jkl345mno",
          "sim_hash_dir_names": "678pqr901stu",
          "lang_extensions": {
            "py": 8
          },
          "children": []
        }
      ]
    },
    "rank_threshold": 5,
    "category": "github",
    "query_limit": 10,
    "min_accepted_score": 0.7
  }' | jq

Response Format

Returns scan results grouped by folder path.
  • results: List of folder paths with their matching components
  • status: Response status indicating success or failure
Each result includes:
  • path_id: The folder path identifier from the request
  • components: List of matching components found in this folder
Each component includes:
  • purl: Component Package URL
  • name: Component name
  • vendor: Component maintainer or organisation
  • versions: Matched versions with similarity scores
  • rank: Component quality rank (1–9)
  • order: Match priority (1 = best match)

Response Example

Successful Scan with Matches

{
  "results": [
    {
      "path_id": "src",
      "components": [
        {
          "purl": "pkg:github/scanoss/scanoss.py",
          "name": "scanoss-py",
          "vendor": "scanoss",
          "versions": [
            {
              "version": "1.30.0",
              "score": 0.95,
              "licenses": [
                {
                  "name": "MIT License",
                  "spdx_id": "MIT",
                  "is_spdx_approved": true,
                  "url": "https://spdx.org/licenses/MIT.html"
                }
              ]
            },
            {
              "version": "1.29.0",
              "score": 0.87,
              "licenses": [
                {
                  "name": "MIT License",
                  "spdx_id": "MIT",
                  "is_spdx_approved": true,
                  "url": "https://spdx.org/licenses/MIT.html"
                }
              ]
            }
          ],
          "rank": 1,
          "order": 1
        },
        {
          "purl": "pkg:github/example/similar-project",
          "name": "similar-project",
          "vendor": "example",
          "versions": [
            {
              "version": "2.1.0",
              "score": 0.78,
              "licenses": [
                {
                  "name": "Apache License 2.0",
                  "spdx_id": "Apache-2.0",
                  "is_spdx_approved": true,
                  "url": "https://spdx.org/licenses/Apache-2.0.html"
                },
                {
                  "name": "MIT License",
                  "spdx_id": "MIT",
                  "is_spdx_approved": true,
                  "url": "https://spdx.org/licenses/MIT.html"
                }
              ]
            }
          ],
          "rank": 3,
          "order": 2
        }
      ]
    },
    {
      "path_id": "src/utils",
      "components": [
        {
          "purl": "pkg:pypi/requests",
          "name": "requests",
          "vendor": "psf",
          "versions": [
            {
              "version": "2.31.0",
              "score": 0.92,
              "licenses": [
                {
                  "name": "Apache License 2.0",
                  "spdx_id": "Apache-2.0",
                  "is_spdx_approved": true,
                  "url": "https://spdx.org/licenses/Apache-2.0.html"
                }
              ]
            }
          ],
          "rank": 1,
          "order": 1
        }
      ]
    }
  ],
  "status": {
    "status": "SUCCESS",
    "message": "Scan completed successfully"
  }
}

Scan with No Matches

{
  "results": [],
  "status": {
    "status": "SUCCESS",
    "message": "Scan completed successfully"
  }
}

Match Quality

Rank (1-9) Indicates confidence in component origin:
  • 1-2: Official or highly trusted repositories
  • 3-4: Verified community projects
  • 5-6: Standard open-source projects
  • 7-9: Lower confidence or derivative matches
Score (0-1) Similarity confidence per version:
  • 0.9-1.0: Very high confidence
  • 0.8-0.89: High confidence
  • 0.7-0.79: Moderate confidence
  • Below 0.7: Low confidence (filterable via min_accepted_score)