Skip to main content

Business Continuity and Disaster Recovery

Confirmed Recovery-Relevant Implementations

  • Containerized deployment artifacts and startup scripts exist:
    • Aventora-Assistant/Dockerfile, Aventora-Assistant/docker-compose.yml
    • domain-chatbot/Dockerfile, domain-chatbot/Dockerfile.agent
    • domain-chatbot/deploy_to_production.sh
  • DB schema initialization/migration logic exists in both systems.
  • domain-chatbot includes an index backup endpoint path:
    • domain-chatbot/LLM_full/index/router.py (/regenerate-index/backup)

What Is Not Proven in Code

  • Automated PostgreSQL backup scheduling and retention policy enforcement.
  • Cross-region replication/failover architecture.
  • Recovery testing cadence (restore drills, failover exercises).
  • Formal RTO/RPO definitions.

Risk Assessment

  • Application restart/redeploy capabilities appear present.
  • Data durability assurance is under-documented at code level (depends on DB/infrastructure operations).
  • Knowledge-base backup endpoint exists, but does not prove full-system DR maturity.

Gaps

  1. No consolidated DR runbook found for Hub + domain-chatbot.
  2. No in-repo evidence of periodic restore validation automation.
  3. No explicit dependency continuity strategy for telephony/LLM/OAuth outages.

Recommendations

  1. Define and document RTO/RPO by service and data class.
  2. Implement automated, encrypted DB backups with tested restore procedures.
  3. Add scheduled restore verification in non-production environment.
  4. Define degraded-mode behavior when third-party providers fail.
  5. Document service dependency map and failover sequence.

Suggested DR Test Matrix

  1. DB restore from latest backup.
  2. Cold start of Hub and domain-chatbot after config loss.
  3. Recovery with one provider unavailable (Twilio or OpenAI).
  4. Reindex and knowledge restoration validation.
  5. End-to-end smoke test after recovery.