Skip to main content

Data Storage and Retention

Confirmed Storage Components

Relational storage (PostgreSQL)

  • Hub schema registry and validators define/create many tables:
    • Aventora-Assistant/db/schema_registry.py
    • Aventora-Assistant/db/schema_validator.py
  • domain-chatbot runtime DB initialization and migrations create application tables:
    • domain-chatbot/LLM_full/db_operations.py

File/log storage

  • Rotating file logging in both systems:
    • Aventora-Assistant/server/server.py
    • domain-chatbot/logging_config.py

Sensitive Data at Rest (Confirmed)

  • Hub stores:
    • account-level domain_chatbot_api_key in accounts paths (Aventora-Assistant/db/account_manager.py)
    • OAuth access/refresh tokens in users paths (Aventora-Assistant/db/user_manager.py)
  • domain-chatbot stores:
    • user credentials (bcrypt hashes)
    • domain API keys and temporary access tokens
    • submissions/intake records and domain-specific settings
    • domain-chatbot/LLM_full/db_operations.py

Data Protection Controls (Confirmed)

  • API keys for Hub API access are hash-stored (SHA-256 hash of generated key):
    • Aventora-Assistant/db/api_key_manager.py
  • Inbound secure links are token-hashed before DB persistence:
    • Aventora-Assistant/db/inbound_secure_link_manager.py

Not Verifiable from Repository Alone

  • Encryption at rest for PostgreSQL volumes (depends on cloud/disk/database config outside code).
  • Centralized retention policy enforcement and legal hold process.
  • Automatic immutable log archive policy.

Retention/Deletion Signals in Code (Partial)

  • Cleanup and status update routines exist for some workflows (sessions, workers, call records).
  • No single, centrally enforced retention engine was identified for all sensitive tables.

Gaps / Risks

  1. Plaintext-style secret fields appear persisted for operational use (for example OAuth refresh tokens and account-linked domain API keys), with no field-level encryption wrapper evident in these modules.
  2. Repository includes sample env files with real-looking secret values in domain-chatbot:
    • domain-chatbot/.env.sample
    • domain-chatbot/.env.template
  3. No comprehensive, code-level retention matrix tied to table classes (PII, secrets, telemetry, transcripts).

Recommendations

  1. Introduce application-level envelope encryption for high-sensitivity fields (OAuth refresh tokens, account-linked external API secrets).
  2. Rotate and purge any secrets exposed in committed sample/template env files, and replace with placeholders.
  3. Create a machine-readable retention policy map (table -> data class -> retention period -> deletion method).
  4. Add scheduled deletion/archival jobs with auditable execution logs for sensitive datasets.
  5. Add a data inventory document aligned with compliance scopes (PII, PHI-like fields, credentials, operational metadata).