Why We Track
Telemetry helps us understand how ai-finder is used in the real world so we can make better decisions about product development.What We Learn
| Question | How Telemetry Helps |
|---|---|
| Which output formats are most used? | We track scan.format.cyclonedx vs scan.format.spdx to prioritise SBOM format improvements |
| Is KB enrichment valuable? | We track scan.enrich.enabled and identify.kb_match.found to understand adoption and success rates |
| What errors do users hit? | We track error.scan.file_not_found and error.identify.parse_error to fix common issues |
| How large are typical scans? | We track findings_count buckets to optimise performance for real-world usage |
| Is the KB crawling working? | We track kb.crawl.*.result.success vs kb.crawl.*.errors.yes to monitor data quality |
How This Improves the Product
- Prioritise features - If most users use CycloneDX, we focus on CycloneDX improvements
- Fix real issues - Error tracking shows us what actually breaks in production
- Optimise performance - Understanding typical scan sises helps us optimise the right code paths
- Validate changes - After a release, we can see if error rates decrease
- Guide documentation - If users hit the same errors repeatedly, we improve docs
What We Don’t Do
- We don’t track individual users or sessions
- We don’t correlate events to identify user behavior patterns
- We don’t sell or share telemetry data with third parties
- We don’t use telemetry for advertising or marketing
Privacy Commitments
We never collect:- File paths or scan targets
- PURL values or package names you look up
- Model file names, hashes, or contents
- Stack traces or error messages (which could contain paths)
- Any personally identifiable information (PII)
- Which commands are run and their options (format, flags)
- Success/failure status and duration
- Aggregate counts (files scanned, findings count)
- Exception type names (e.g.,
FileNotFoundError, not the message)
Opt-Out
Disable telemetry using any of these methods:CLI Flag (per-session)
Environment Variables
Config File (persistent)
Create~/.ai-finder/config.json:
Events Collected
All events are designed for funnel analysis - each behavior emits a discrete event that can be counted and visualised as funnel steps.Lifecycle Events
| Event | Description |
|---|---|
cli.started | CLI was invoked |
Command Events (with properties)
Each command emitsstarted and completed events with properties for detailed analysis:
| Command | Started Properties | Completed Properties |
|---|---|---|
| scan | format, quiet, enrich, relationships | files_scanned, findings_count, output_format, model_count, sdk_count, manifest_count, kb_available, graph_nodes, graph_edges |
| identify | format, enrich | recognised, kb_match, output_format, model_format, kb_available |
| kb.init | - | - |
| kb.status | format | schema_version, total_entries, output_format |
| kb.lookup | format | results_count |
| kb.crawl | source | items_added, error_count |
Discrete Feature Events (for funnels)
These events enable funnel visualisation without parsing properties:scan
Scan Pipeline Events (ordered funnel)
| Event | Description |
|---|---|
scan.started | Scan execution started |
scan.discovery.started | File discovery phase started |
scan.discovery.completed | File discovery completed (with file counts) |
scan.detection.started | Detection phase started (sdk/manifest/model) |
scan.detection.completed | Detection phase completed (with finding counts) |
scan.sdk.found | Individual SDK detected (per SDK) |
scan.manifest_dep.found | Individual manifest dependency found (per dep) |
scan.metrics | Overall scan metrics |
scan.completed | Scan execution completed successfully |
scan.enrichment.started | KB enrichment phase started |
scan.enrichment.completed | KB enrichment phase completed |
scan.output.started | SBOM output generation started |
scan.output.completed | SBOM output generation completed |
Scan Feature Events
| Event | Description |
|---|---|
scan.format.json | Output format is JSON |
scan.format.cyclonedx | Output format is CycloneDX SBOM |
scan.format.spdx | Output format is SPDX SBOM |
scan.format.text | Output format is text |
scan.enrich.enabled | KB enrichment is enabled |
scan.relationships.enabled | Relationship graph is enabled |
scan.findings.none | No AI artifacts found |
scan.findings.few | 1-10 AI artifacts found |
scan.findings.many | 10+ AI artifacts found |
scan.artifact_type.model | Model files found |
scan.artifact_type.sdk | SDK usage found |
scan.artifact_type.manifest | Manifest dependencies found |
scan.graph_built.success | Relationship graph built |
scan.kb_source.local | Using local KB cache |
scan.kb_source.live_only | No local KB, using live APIs |
Complete Scan Funnel
identify
| Event | Description |
|---|---|
identify.format.json | Output format is JSON |
identify.format.text | Output format is text |
identify.enrich.enabled | KB enrichment is enabled |
identify.unknown_extension.{ext} | Unknown file extension encountered |
identify.recognised.yes | Model file was recognised |
identify.recognised.no | Model file was not recognised |
identify.model_format.gguf | Model format is GGUF |
identify.model_format.safetensors | Model format is SafeTensors |
identify.model_format.onnx | Model format is ONNX |
identify.model_format.pytorch | Model format is PyTorch |
identify.kb_source.local | Using local KB cache |
identify.kb_source.live_only | No local KB, using live APIs |
identify.kb_match.found | KB lookup found a match |
identify.kb_match.not_found | KB lookup found no match |
kb.status
| Event | Description |
|---|---|
kb.status.format.json | Output format is JSON |
kb.status.format.text | Output format is text |
kb.status.db.not_found | KB database doesn’t exist |
kb.status.entries.empty | KB has 0 entries |
kb.status.entries.small | KB has 1-99 entries |
kb.status.entries.medium | KB has 100-999 entries |
kb.status.entries.large | KB has 1000+ entries |
kb.lookup
| Event | Description |
|---|---|
kb.lookup.format.json | Output format is JSON |
kb.lookup.format.text | Output format is text |
kb.lookup.result.found | Lookup found results |
kb.lookup.result.not_found | Lookup found no results |
kb.lookup.found_type.sdk | Found SDK entries |
kb.lookup.found_type.model | Found model entries |
kb.lookup.found_type.package | Found package entries |
kb.crawl
| Event | Description |
|---|---|
kb.crawl.source.huggingface | Crawling HuggingFace |
kb.crawl.source.pypi | Crawling PyPI |
kb.crawl.source.npm | Crawling npm |
kb.crawl.source.all | Crawling all sources |
kb.crawl.crawler.huggingface | HuggingFace crawler ran |
kb.crawl.crawler.pypi | PyPI crawler ran |
kb.crawl.crawler.npm | npm crawler ran |
kb.crawl.db_init.created | KB was auto-initialised |
kb.crawl.huggingface.result.success | HuggingFace added items |
kb.crawl.pypi.result.success | PyPI added items |
kb.crawl.npm.result.success | npm added items |
kb.crawl.huggingface.errors.yes | HuggingFace had errors |
kb.crawl.pypi.errors.yes | PyPI had errors |
kb.crawl.npm.errors.yes | npm had errors |
kb.crawl.result.success | Overall crawl added items |
kb.crawl.result.empty | Overall crawl added nothing |
kb.crawl.had_errors.yes | Overall crawl had errors |
Enrichment Events (from KBEnricher)
These events are emitted during KB enrichment in scan and identify commands:| Event | Properties | Description |
|---|---|---|
enrichment.cache_hit | type | Session cache hit (avoids repeated lookups) |
enrichment.kb_hit | type, name/ecosystem | Found in local KB cache |
enrichment.live_fetch | type, source | Successfully fetched from live API |
enrichment.model_not_found | source, name | Model not found in HuggingFace |
enrichment.package_not_found | source, name | Package not found in PyPI/npm |
enrichment.live_fetch_failed | type, source, error_category | Live API fetch failed |
enrichment.unsupported_ecosystem | ecosystem | Unsupported package ecosystem |
network_error- Connection failedtimeout- Request timed outssl_error- SSL/TLS errornot_found- 404 responserate_limited- 429 responseauth_error- 401/403 responseserver_error- 5xx responsehttp_error- Other HTTP errormissing_dependency- Required library not installedparse_error- JSON/response parsing failedunknown- Unclassified error
Error Events (granular)
Errors emit discrete events for funnel analysis:| Event Pattern | Description |
|---|---|
error.{command}.file_not_found | File not found |
error.{command}.permission_denied | Permission denied |
error.{command}.is_directory | Expected file, got directory |
error.{command}.disk_full | Disk full |
error.{command}.out_of_memory | Out of memory |
error.{command}.symlink_loop | Symlink loop detected |
error.{command}.os_error | Other OS error |
error.{command}.invalid_value | Invalid value |
error.{command}.network_error | Network error |
error.{command}.http_error | HTTP error |
error.{command}.database_error | Database error |
error.{command}.parse_error | Parse/decode error |
error.{command}.encoding_error | Encoding error |
error.{command}.unknown | Unknown error |
error event with properties for detailed analysis:
error_type: Exception class nameerror_category: Classified categorycontext: Command context
Implementation
Telemetry is implemented inpackages/ai-finder/src/ai_finder_cli/telemetry.py. Key design decisions:
- Fail-closed: If the config file is unreadable or the telemetry library fails to initialise, telemetry is disabled.
- Lasy initialisation: The telemetry client is only created on first use, after checking all opt-out mechanisms.
-
Graceful shutdown: Events are flushed on CLI exit via
atexit. - No blocking: Telemetry operations do not block CLI execution.
Data Handling
- Backend: Events are sent to SCANOSS telemetry infrastructure
- Retention: Usage data is retained for product analytics purposes
- Access: Data is only accessible to SCANOSS engineering team