Skip to main content

Why We Track

Telemetry helps us understand how ai-finder is used in the real world so we can make better decisions about product development.

What We Learn

QuestionHow Telemetry Helps
Which output formats are most used?We track scan.format.cyclonedx vs scan.format.spdx to prioritise SBOM format improvements
Is KB enrichment valuable?We track scan.enrich.enabled and identify.kb_match.found to understand adoption and success rates
What errors do users hit?We track error.scan.file_not_found and error.identify.parse_error to fix common issues
How large are typical scans?We track findings_count buckets to optimise performance for real-world usage
Is the KB crawling working?We track kb.crawl.*.result.success vs kb.crawl.*.errors.yes to monitor data quality

How This Improves the Product

  1. Prioritise features - If most users use CycloneDX, we focus on CycloneDX improvements
  2. Fix real issues - Error tracking shows us what actually breaks in production
  3. Optimise performance - Understanding typical scan sises helps us optimise the right code paths
  4. Validate changes - After a release, we can see if error rates decrease
  5. Guide documentation - If users hit the same errors repeatedly, we improve docs

What We Don’t Do

  • We don’t track individual users or sessions
  • We don’t correlate events to identify user behavior patterns
  • We don’t sell or share telemetry data with third parties
  • We don’t use telemetry for advertising or marketing

Privacy Commitments

We never collect:
  • File paths or scan targets
  • PURL values or package names you look up
  • Model file names, hashes, or contents
  • Stack traces or error messages (which could contain paths)
  • Any personally identifiable information (PII)
We only collect:
  • Which commands are run and their options (format, flags)
  • Success/failure status and duration
  • Aggregate counts (files scanned, findings count)
  • Exception type names (e.g., FileNotFoundError, not the message)

Opt-Out

Disable telemetry using any of these methods:

CLI Flag (per-session)

ai-finder --no-telemetry scan /path/to/project

Environment Variables

# AI Finder-specific
export AI_FINDER_TELEMETRY=0

# Universal opt-out standard (https://consoledonottrack.com/)
export DO_NOT_TRACK=1

Config File (persistent)

Create ~/.ai-finder/config.json:
{
  "telemetry": false
}

Events Collected

All events are designed for funnel analysis - each behavior emits a discrete event that can be counted and visualised as funnel steps.

Lifecycle Events

EventDescription
cli.startedCLI was invoked

Command Events (with properties)

Each command emits started and completed events with properties for detailed analysis:
CommandStarted PropertiesCompleted Properties
scanformat, quiet, enrich, relationshipsfiles_scanned, findings_count, output_format, model_count, sdk_count, manifest_count, kb_available, graph_nodes, graph_edges
identifyformat, enrichrecognised, kb_match, output_format, model_format, kb_available
kb.init--
kb.statusformatschema_version, total_entries, output_format
kb.lookupformatresults_count
kb.crawlsourceitems_added, error_count

Discrete Feature Events (for funnels)

These events enable funnel visualisation without parsing properties:

scan

Scan Pipeline Events (ordered funnel)
EventDescription
scan.startedScan execution started
scan.discovery.startedFile discovery phase started
scan.discovery.completedFile discovery completed (with file counts)
scan.detection.startedDetection phase started (sdk/manifest/model)
scan.detection.completedDetection phase completed (with finding counts)
scan.sdk.foundIndividual SDK detected (per SDK)
scan.manifest_dep.foundIndividual manifest dependency found (per dep)
scan.metricsOverall scan metrics
scan.completedScan execution completed successfully
scan.enrichment.startedKB enrichment phase started
scan.enrichment.completedKB enrichment phase completed
scan.output.startedSBOM output generation started
scan.output.completedSBOM output generation completed
Scan Feature Events
EventDescription
scan.format.jsonOutput format is JSON
scan.format.cyclonedxOutput format is CycloneDX SBOM
scan.format.spdxOutput format is SPDX SBOM
scan.format.textOutput format is text
scan.enrich.enabledKB enrichment is enabled
scan.relationships.enabledRelationship graph is enabled
scan.findings.noneNo AI artifacts found
scan.findings.few1-10 AI artifacts found
scan.findings.many10+ AI artifacts found
scan.artifact_type.modelModel files found
scan.artifact_type.sdkSDK usage found
scan.artifact_type.manifestManifest dependencies found
scan.graph_built.successRelationship graph built
scan.kb_source.localUsing local KB cache
scan.kb_source.live_onlyNo local KB, using live APIs
Complete Scan Funnel
cli.started
 -> command.scan.started
    -> scan.started
       -> scan.discovery.started
       -> scan.discovery.completed
       -> scan.detection.started (phase: sdk)
       -> scan.sdk.found (per SDK)
       -> scan.detection.completed (phase: sdk)
       -> scan.detection.started (phase: manifest)
       -> scan.manifest_dep.found (per dependency)
       -> scan.detection.completed (phase: manifest)
       -> scan.detection.started (phase: model)
       -> scan.detection.completed (phase: model)
       -> scan.metrics
       -> scan.completed
    -> scan.enrichment.started
       -> enrichment.* events
    -> scan.enrichment.completed
    -> scan.output.started
    -> scan.output.completed
 -> command.scan.completed

identify

EventDescription
identify.format.jsonOutput format is JSON
identify.format.textOutput format is text
identify.enrich.enabledKB enrichment is enabled
identify.unknown_extension.{ext}Unknown file extension encountered
identify.recognised.yesModel file was recognised
identify.recognised.noModel file was not recognised
identify.model_format.ggufModel format is GGUF
identify.model_format.safetensorsModel format is SafeTensors
identify.model_format.onnxModel format is ONNX
identify.model_format.pytorchModel format is PyTorch
identify.kb_source.localUsing local KB cache
identify.kb_source.live_onlyNo local KB, using live APIs
identify.kb_match.foundKB lookup found a match
identify.kb_match.not_foundKB lookup found no match

kb.status

EventDescription
kb.status.format.jsonOutput format is JSON
kb.status.format.textOutput format is text
kb.status.db.not_foundKB database doesn’t exist
kb.status.entries.emptyKB has 0 entries
kb.status.entries.smallKB has 1-99 entries
kb.status.entries.mediumKB has 100-999 entries
kb.status.entries.largeKB has 1000+ entries

kb.lookup

EventDescription
kb.lookup.format.jsonOutput format is JSON
kb.lookup.format.textOutput format is text
kb.lookup.result.foundLookup found results
kb.lookup.result.not_foundLookup found no results
kb.lookup.found_type.sdkFound SDK entries
kb.lookup.found_type.modelFound model entries
kb.lookup.found_type.packageFound package entries

kb.crawl

EventDescription
kb.crawl.source.huggingfaceCrawling HuggingFace
kb.crawl.source.pypiCrawling PyPI
kb.crawl.source.npmCrawling npm
kb.crawl.source.allCrawling all sources
kb.crawl.crawler.huggingfaceHuggingFace crawler ran
kb.crawl.crawler.pypiPyPI crawler ran
kb.crawl.crawler.npmnpm crawler ran
kb.crawl.db_init.createdKB was auto-initialised
kb.crawl.huggingface.result.successHuggingFace added items
kb.crawl.pypi.result.successPyPI added items
kb.crawl.npm.result.successnpm added items
kb.crawl.huggingface.errors.yesHuggingFace had errors
kb.crawl.pypi.errors.yesPyPI had errors
kb.crawl.npm.errors.yesnpm had errors
kb.crawl.result.successOverall crawl added items
kb.crawl.result.emptyOverall crawl added nothing
kb.crawl.had_errors.yesOverall crawl had errors

Enrichment Events (from KBEnricher)

These events are emitted during KB enrichment in scan and identify commands:
EventPropertiesDescription
enrichment.cache_hittypeSession cache hit (avoids repeated lookups)
enrichment.kb_hittype, name/ecosystemFound in local KB cache
enrichment.live_fetchtype, sourceSuccessfully fetched from live API
enrichment.model_not_foundsource, nameModel not found in HuggingFace
enrichment.package_not_foundsource, namePackage not found in PyPI/npm
enrichment.live_fetch_failedtype, source, error_categoryLive API fetch failed
enrichment.unsupported_ecosystemecosystemUnsupported package ecosystem
Enrichment error categories:
  • network_error - Connection failed
  • timeout - Request timed out
  • ssl_error - SSL/TLS error
  • not_found - 404 response
  • rate_limited - 429 response
  • auth_error - 401/403 response
  • server_error - 5xx response
  • http_error - Other HTTP error
  • missing_dependency - Required library not installed
  • parse_error - JSON/response parsing failed
  • unknown - Unclassified error

Error Events (granular)

Errors emit discrete events for funnel analysis:
Event PatternDescription
error.{command}.file_not_foundFile not found
error.{command}.permission_deniedPermission denied
error.{command}.is_directoryExpected file, got directory
error.{command}.disk_fullDisk full
error.{command}.out_of_memoryOut of memory
error.{command}.symlink_loopSymlink loop detected
error.{command}.os_errorOther OS error
error.{command}.invalid_valueInvalid value
error.{command}.network_errorNetwork error
error.{command}.http_errorHTTP error
error.{command}.database_errorDatabase error
error.{command}.parse_errorParse/decode error
error.{command}.encoding_errorEncoding error
error.{command}.unknownUnknown error
Plus a generic error event with properties for detailed analysis:
  • error_type: Exception class name
  • error_category: Classified category
  • context: Command context
Note: Error messages and stack traces are never sent.

Implementation

Telemetry is implemented in packages/ai-finder/src/ai_finder_cli/telemetry.py. Key design decisions:
  1. Fail-closed: If the config file is unreadable or the telemetry library fails to initialise, telemetry is disabled.
  2. Lasy initialisation: The telemetry client is only created on first use, after checking all opt-out mechanisms.
  3. Graceful shutdown: Events are flushed on CLI exit via atexit.
  4. No blocking: Telemetry operations do not block CLI execution.

Data Handling

  • Backend: Events are sent to SCANOSS telemetry infrastructure
  • Retention: Usage data is retained for product analytics purposes
  • Access: Data is only accessible to SCANOSS engineering team

Questions?

If you have questions about telemetry or privacy, please open an issue at https://github.com/scanoss/ai-finder/issues.