Skip to main content

Business Continuity and Disaster Recovery

Confirmed Recovery-Relevant Implementations

Containerized deployment artifacts and startup scripts exist:
- Aventora-Assistant/Dockerfile, Aventora-Assistant/docker-compose.yml
- domain-chatbot/Dockerfile, domain-chatbot/Dockerfile.agent
- domain-chatbot/deploy_to_production.sh
DB schema initialization/migration logic exists in both systems.
domain-chatbot includes an index backup endpoint path:
- domain-chatbot/LLM_full/index/router.py (/regenerate-index/backup)

What Is Not Proven in Code

Automated PostgreSQL backup scheduling and retention policy enforcement.
Cross-region replication/failover architecture.
Recovery testing cadence (restore drills, failover exercises).
Formal RTO/RPO definitions.

Risk Assessment

Application restart/redeploy capabilities appear present.
Data durability assurance is under-documented at code level (depends on DB/infrastructure operations).
Knowledge-base backup endpoint exists, but does not prove full-system DR maturity.

Gaps

No consolidated DR runbook found for Hub + domain-chatbot.
No in-repo evidence of periodic restore validation automation.
No explicit dependency continuity strategy for telephony/LLM/OAuth outages.

Recommendations

Define and document RTO/RPO by service and data class.
Implement automated, encrypted DB backups with tested restore procedures.
Add scheduled restore verification in non-production environment.
Define degraded-mode behavior when third-party providers fail.
Document service dependency map and failover sequence.

Suggested DR Test Matrix

DB restore from latest backup.
Cold start of Hub and domain-chatbot after config loss.
Recovery with one provider unavailable (Twilio or OpenAI).
Reindex and knowledge restoration validation.
End-to-end smoke test after recovery.

Confirmed Recovery-Relevant Implementations
What Is Not Proven in Code
Risk Assessment
Gaps
Recommendations
Suggested DR Test Matrix