IntakeBot — Conversational Form Intake: Requirements and Design

Conversational form intake engine. No dynamic forms are built or shown to the user. The bot collects information purely through conversation (chat or voice). Forms are defined in JSON and stored in a form repository; the Agent runs in intake mode when started with --FORMS. Data is written to an intake folder, timestamped and prefixed by form name, with partial data persisted as the LLM collects it.

1. Overview

1.1 Purpose

IntakeBot is a conversational form intake engine. The bot:

Talks the user through a form: introduces the form, asks for each field in order, and collects answers via conversation only (chat or voice).
Does not render or display any form UI — no on-screen fields, cards, or dynamic forms. The user never sees a “form”; they only converse with the bot.
When multiple forms exist, helps the user choose which form to fill using intent instructions and explicit options.
Persists data into an intake folder: one timestamped file per intake session, updated once per user turn (as fields are collected) so partial data is always on disk.

1.2 Design Principles

Conversation-only — No forms are built or shown. All collection happens via dialogue.
Form-depo + intake mode — Form definitions live in Agent/form-depo/. The Agent is started with --FORMS to run in intake mode (form intake only, no domain RAG/Q&A).
Form-agnostic — Any form that fits the JSON schema works. The schema includes intent/selection instructions so the LLM can decide which form the user wants when there are many.
Partial data always — Partial data is written once per user turn (after all record_field and any submit_form for that turn). We always have up-to-date partial data on disk, not only on completion.

1.3 Out of Scope (First Version)

Dynamic form UI, cards, or progress bars. Purely conversational.
Anonymous intake without a defined identity (e.g. session id) for the intake file.
Conditional fields, pattern validation, min/max. Deferred.

2. Form Repository (form-depo) and Intake Mode

2.1 Form-depo Folder

Location: When FORM_DEPO_PATH is unset or empty, use <project_root>/Agent/form-depo (domain-chatbot project root). Set the FORM_DEPO_PATH env var to override.
Content: One JSON file per form definition. File names can match form id (e.g. contact_form.json) or be arbitrary; the form’s id inside the JSON is the canonical identifier.
Loading: The API (service_formintake) loads forms from the resolved path: it scans *.json, validates each as a form schema. These are the only forms available for intake.

2.2 Intake Mode and `--FORMS`

Trigger: The Agent’s main script is run with --FORMS (e.g. python run_agent.py --FORMS or python -m Agent.main --FORMS). The main() in Agent/main.py reads this flag.
Behaviour when --FORMS is set:
- The Agent operates in intake mode.
- Forms are loaded by the API from the form-depo path: when FORM_DEPO_PATH is unset, <project_root>/Agent/form-depo; set FORM_DEPO_PATH to override.
- No domain RAG, no file_search, no domain Q&A. The bot’s only job is form intake: form selection (if multiple) and then field collection for the chosen form.
When --FORMS is not set: The Agent runs in the existing, normal mode (domain RAG, Pipecat, etc.). Intake and form-depo are not used.

2.3 Single Form vs Multiple Forms

Single form in form-depo: Bot can go straight to the form’s introduction and first field (optionally still mention which form it is).
Multiple forms in form-depo: Bot must first disambiguate:
- Use each form’s intent/selection instructions (and optionally keywords) so the LLM can interpret what the user wants.
- If the user’s intent is unclear or several forms could match, the bot presents options (e.g. using title or short_label per form) and asks the user to choose. Once chosen, load that form and start intake.

3. Form Schema (JSON)

3.1 Goals

Form identity, title, introduction, and fields.
Intent/selection instructions for the LLM to decide when this form is the one the user wants (and for disambiguation when there are multiple forms).
Multi-language for all user-facing strings.
Field types, required/optional, options for select.

3.2 Proposed Structure

{
  "id": "contact_form",
  "title": { "en": "Contact Form", "fa": "فرم تماس" },
  "short_label": { "en": "Contact", "fa": "تماس" },
  "intent_instructions": {
    "en": "Choose this form when the user wants to: send a message, get in touch, leave contact info, submit feedback, or ask a question.",
    "fa": "این فرم وقتی انتخاب شود که کاربر بخواهد: پیام بفرستد، تماس بگیرد، اطلاعات تماس بگذارد، بازخورد بدهد یا سؤال بپرسد."
  },
  "introduction": {
    "en": "I'll help you with that. I'll ask a few questions.",
    "fa": "کمک می‌کنم. چند سؤال می‌پرسم."
  },
  "fields": [
    {
      "id": "full_name",
      "type": "text",
      "required": true,
      "label": { "en": "Full name", "fa": "نام کامل" },
      "placeholder": { "en": "e.g. John Doe", "fa": "مثال: علی محمدی" }
    },
    {
      "id": "email",
      "type": "email",
      "required": true,
      "label": { "en": "Email", "fa": "ایمیل" }
    },
    {
      "id": "message",
      "type": "textarea",
      "required": false,
      "label": { "en": "Message", "fa": "پیام" }
    }
  ],
  "submit_label": { "en": "Submit", "fa": "ارسال" },
  "metadata": {
    "version": "1.0",
    "supported_languages": ["en", "fa"]
  }
}

3.3 Intent and Form Selection

intent_instructions (object, one key per language): Instructions for the LLM to decide if the user’s message matches this form. Used when:
- There is a single form (to confirm it’s the right one), or
- There are multiple forms (to shortlist and, if needed, to present options).
short_label (object, per language): Short name for the “Which form do you want?” list. If missing, fallback to title or id.
When multiple forms match or intent is ambiguous: the bot lists forms using short_label (or title) and asks the user to pick. The LLM then continues with the chosen form’s schema.

3.4 Field Types

Type	Description	Validation (v1)	Example in stored data
`text`	Short text	Non-empty if required	`"John"`
`email`	Email	Basic `@.*` if required	`"a@b.com"`
`phone`	Phone	Optional	`"+1234567890"`
`number`	Numeric	Optional	`42`
`date`	Date	Optional (e.g. ISO)	`"2025-01-25"`
`select`	Single choice	Value in `options`	`"opt_a"`
`textarea`	Longer text	Non-empty if required	`"Hello..."`
`boolean`	Yes/no	Normalize to bool	`true`

select example:

{
  "id": "country",
  "type": "select",
  "required": true,
  "label": { "en": "Country", "fa": "کشور" },
  "options": [
    { "value": "ca", "label": { "en": "Canada", "fa": "کانادا" } },
    { "value": "us", "label": { "en": "United States", "fa": "ایالات متحده" } }
  ]
}

3.5 Field Order and Required vs Optional

Order: fields array order = asking order. Bot asks the next required empty field first; optional fields follow or can be skipped.
Required: If required: true, the form is not considered complete until that field is in form_state with a non-null value.
Optional: If required: false, user may skip; we store form_state[field_id] = null (explicit skip). Downstream can distinguish "skipped" vs "never asked".

3.6 Localization

For title, short_label, intent_instructions, introduction, label, placeholder, submit_label, options[].label: use the user’s language with fallback to "en".

4. Intake Storage (intake folder, partial data)

4.1 Intake Folder

Location: When INTAKE_PATH is unset or empty, use <project_root>/Agent/intake. Set the INTAKE_PATH env var to override.
Role: All intake data is written here. No use of the /submissions API in intake mode; the intake folder is the persistence layer.

4.2 File Naming and Lifecycle

Filename: {form_name}_{session_start_timestamp}.json
- form_name = form’s id (e.g. contact_form).
- session_start_timestamp = set as soon as a form is selected (before the first record_field). Format: YYYYMMDD_HHmmss (e.g. 20250125_143022). Stable for the whole session.
One file per intake session. The file is created on form selection with an initial write (form_state: {}). The same file is then overwritten once per user turn (after all record_field and any submit_form for that turn) so that partial data is always on disk.

4.3 File Contents (Partial and Final)

Each write includes at least:

form_id: form’s id
started_at: ISO or same timestamp as in the filename
updated_at: time of this write
form_state: { "field_id": value } — partial while collecting, complete when submitted. Values can be null for skipped optional fields.
completed_at: set only when the form is submitted; omit otherwise.

Example (initial, on form selection):

{
  "form_id": "contact_form",
  "started_at": "2025-01-25T14:30:22",
  "updated_at": "2025-01-25T14:30:22",
  "form_state": {}
}

Example (partial, after two fields):

{
  "form_id": "contact_form",
  "started_at": "2025-01-25T14:30:22",
  "updated_at": "2025-01-25T14:30:45",
  "form_state": {
    "full_name": "John Doe",
    "email": "j@example.com"
  }
}

Example (final, after submit_form):

{
  "form_id": "contact_form",
  "started_at": "2025-01-25T14:30:22",
  "updated_at": "2025-01-25T14:31:10",
  "completed_at": "2025-01-25T14:31:10",
  "form_state": {
    "full_name": "John Doe",
    "email": "j@example.com",
    "message": "Hello, I have a question."
  }
}

4.4 When We Write

On form selection: As soon as the user has chosen a form (or the only form is chosen), create the intake file, set session_start_timestamp (format YYYYMMDD_HHmmss), and write the initial object: form_id, started_at, updated_at, form_state: {}, no completed_at. The response for that turn must include session_start_timestamp so the client can send it on /query/intake/continue.
On record_field: The handler only validates and updates form_state in memory; it does not write. The orchestrator in service_formintake does one write at end of the user turn (after all record_field and any submit_form for that turn). The file already exists from the form-selection write. If submit_form runs in that turn, its handler performs the final write (with completed_at) and no separate end-of-turn write is needed.
On submit_form: Same file; add completed_at and do a final write with the complete form_state.

5. User Stories and Requirements

5.1 Happy Path (Single Form)

#	Actor	Action	Expected outcome
U1	User	Starts a conversation (chat or voice) and wants to fill the only form in form-depo	Bot introduces the form with `introduction[language]` and asks for the first required field.
U2	User	Answers a field (e.g. "John Doe" for full_name)	Bot records it, writes partial data to `{form_id}_{timestamp}.json`, confirms briefly, asks for the next field.
U3	User	Provides all required (and any optional) fields	Bot calls `submit_form`, writes the final state with `completed_at` to the same file, and confirms.

5.2 Form Selection (Multiple Forms)

#	Actor	Action	Expected outcome
U4	User	Says something ambiguous (e.g. "I need to send something") and form-depo has several forms	Bot uses `intent_instructions` to shortlist; if still ambiguous, presents options using `short_label` (or `title`) and asks the user to choose.
U5	User	Picks a form (e.g. "Contact" or "Option 1")	Bot loads that form’s schema, creates the intake file `{form_id}_{session_start_timestamp}.json` with `form_state: {}`, includes `session_start_timestamp` in the response, says the form’s introduction, and asks for the first required field.
U6	User	Says something that clearly matches one form’s `intent_instructions`	Bot chooses that form, creates the intake file, includes `session_start_timestamp` in the response, gives the introduction, and asks for the first field.

5.3 Multi-Turn and Continue

#	Scenario	Requirement
U7	User pauses mid-form	Client stores `form_id`, `form_state`, and `session_start_timestamp`; on resume, sends all three so the bot continues from the next empty required field. The intake file for that session remains; we keep overwriting with partial writes when more fields are recorded.
U8	User corrects a previous answer	Bot overwrites via `record_field` with the same `field_id` and new value; on the next write, the intake file reflects the correction.
U9	User asks "what did I say for email?"	Bot reads from `form_state` and answers.

5.4 Multi-Lingual and Voice

#	Scenario	Requirement
U10	User speaks or types in Farsi	Bot responds in Farsi; uses `label["fa"]`, `introduction["fa"]`, `short_label["fa"]`, `intent_instructions["fa"]`.
U11	User uses voice (Pipecat/Agent)	In intake mode, the same Agent/Pipecat stack is used; STT/TTS and language selection stay as today. No form UI—conversation only.

5.5 Validation, Skip, and Errors

#	Scenario	Requirement
U12	Invalid value (e.g. email)	`record_field` validates; on failure, returns an error and re-asks; no write to the intake file for that value.
U13	User says "skip" for optional field	Bot calls `record_field(field_id, null)`; handler sets `form_state[field_id] = null`. Next write includes that key with `null`.
U14	User says "skip" for required field	Bot explains it is required and re-asks.
U15	`submit_form` but required fields missing	Tool returns `{ ok: false, missing: [...] }`; LLM asks for those; no `completed_at` write.

5.6 Abandonment and Edge Cases

#	Scenario	Requirement
U16	User says "cancel" or "never mind"	Bot acknowledges (natural language; no tool). Client clears `form_id`, `form_state`, and `session_start_timestamp`. The intake file stays as last partial write (no `completed_at`).
U17	User sends chitchat mid-form	Bot answers briefly and returns to the pending field.
U18	User gives multiple fields in one message	LLM may call `record_field` multiple times; each valid call updates `form_state`; one write at end of the turn (orchestrator).

6. Bot Behaviour (System Prompt / Instructions)

6.1 Role and Constraints

Role: Conversational form intake assistant. Use only the form schema(s) from form-depo. Do not use file_search, RAG, or domain knowledge. In intake mode the only tools are record_field, submit_form, and (if we add it) select_form or equivalent for form selection.
No form UI: You never describe or render a form. You only ask questions in natural language and record answers.

6.2 Form and State

Form: id, title, short_label, intent_instructions, introduction, fields (order = asking order). form_state = { field_id: value }. Instructions must let the model derive: missing required fields, next field to ask. Do not inject progress (e.g. "field 3 of 8"); the model may infer it from form_state and fields if it wishes.

6.3 Form Selection (Multiple Forms)

Use each form’s intent_instructions[language] to decide if the user’s message matches that form.
If exactly one matches: select it, create the intake file, say introduction, ask the first required field.
If several match or unclear: list forms by short_label (or title), ask the user to choose. On choice, load that form and proceed as above.
If none match: ask what the user wants to do and, if possible, map to a form or say we don’t have a matching form.

6.4 First Turn for a Chosen Form (Empty `form_state`)

Say the form’s introduction in the user’s language.
Ask for the first required field using label[language] (and placeholder or description if present).

6.5 When the User Provides a Value

Unambiguous: Call record_field with field_id and extracted value. The handler updates form_state only; the orchestrator does one write at end of the turn to the intake file. Then ask the next required empty field (or optional, then submit_form when done).
Ambiguous: Ask a short clarification before calling record_field.
Multiple fields in one message: Call record_field once per field; multiple record_field per message are allowed; the orchestrator does one write at end of the turn.

6.6 When All Required (and Desired Optional) Are Filled

Call submit_form. The handler writes the final state with completed_at to the intake file. Model can then give a short confirmation.

6.7 Correction, Skip, Cancel, and Chitchat

Correction: record_field with same field_id and new value (overwrite). Next write to the intake file has the corrected form_state.
Skip (optional): Call record_field(field_id, null); handler sets form_state[field_id] = null. Move on to next field.
Skip (required): Explain and re-ask.
Cancel: User says "cancel" or "never mind". Acknowledge in natural language; no cancel_form tool. Client clears form_id, form_state, and session_start_timestamp. Intake file remains as last partial write (no completed_at).
Chitchat: Answer briefly and return to the pending field.

7. Tools: `record_field` and `submit_form`

7.1 `record_field`

Purpose: Record one field: validate and update form_state. Used for first-time recording, corrections (same field_id, new value → overwrite), and optional skip (value: null → form_state[field_id] = null). Does not write to disk; the orchestrator in service_formintake does one write at end of the user turn (after all tool calls for that turn).
When: When the user has clearly given a value for a known field (first time or correction), or explicitly skips an optional field (value: null).
Parameters: field_id, value (value may be null only for optional fields—skip).
Handler:
- If value is null: only allowed for optional fields. If the field is required: return { ok: false, error: "validation_failed", message: "Cannot skip required field" }. If optional: form_state[field_id] = null; return { ok: true, recorded: field_id }.
- If value is not null: validate field_id in form_schema.fields and value (e.g. email format). If invalid: return { ok: false, error: "validation_failed", message: "..." }; do not update form_state or write.
- If valid: form_state[field_id] = value. Return { ok: true, recorded: field_id }. Do not write to the intake file; the service/orchestrator performs one write per turn after all record_field (and any submit_form) for that turn.

7.2 `submit_form`

Purpose: Mark the form complete and write the final state to the intake file.
When: Only when form_state has every required field.
Parameters: None.
Handler:
1. Check all required fields are present and non-empty. If not: return { ok: false, error: "missing_fields", missing: [...] }; do not write completed_at.
2. If valid: set completed_at; write to the same {form_id}_{session_start_timestamp}.json the object with form_id, started_at, updated_at, completed_at, form_state. Return { ok: true, form_submitted: true, message: "<localized success>" }.

7.3 Where These Tools Live

In LLM_full/query/service_formintake.py only. This module is the main intake service: it implements form loading, instructions, record_field, submit_form, and intake-folder writes. Not in the global domain-tools registry. service_vector.py is unchanged and is not used for intake.

8. Intake Mode: Agent and Startup

8.1 Entrypoint

Script: run_agent.py (or python -m Agent.main). Both should support --FORMS.
main() in Agent/main.py: On startup, check for --FORMS. If present:
- Set intake mode (e.g. INTAKE_MODE=True or equivalent).
- Start the rest of the Agent (WebSocket, Pipecat, etc.) configured so that in intake mode only the intake pipeline runs (no domain /query with RAG). The API (service_formintake) resolves form-depo and intake paths from settings: when FORM_DEPO_PATH or INTAKE_PATH is unset, defaults to <project_root>/Agent/form-depo and <project_root>/Agent/intake; set the env vars to override. The design is that in intake mode the bot only does form selection and field collection; the API loads forms and writes to the intake folder.

8.2 When `--FORMS` Is Not Set

Agent runs as today: domain RAG, Pipecat, no form-depo, no intake folder. No change to existing behaviour.

9. Request and Response (Intake Mode)

In intake mode the Agent typically handles conversations via WebSocket or Pipecat. The exact message shape can match the existing chat/voice protocol; the following is the logical contract for intake.

9.1 Form Selection Phase

Request: User message (text or STT).
Response: Either (a) a list of form options asking the user to choose, or (b) direct introduction to the chosen form and the first question. No form_schema needs to be sent by the client; forms come from form-depo.

9.2 Field Collection Phase

Request: User message. The server must know form_id, form_state, and session_start_timestamp (from the form-selection response or from the client on continue).
Response: answer, form_id, form_state; when the intake file was created on form selection, also session_start_timestamp (so the client can send it on every continue). When the form is submitted: form_submitted: true, form_data: form_state. The intake file is created on form selection and updated on record_field and on submit_form as described in §4.

9.3 Session and Continuation

The Agent (or client) must track per session: form_id, form_state, session_start_timestamp (required so we can read/write the same intake file). The client must send session_start_timestamp on every /query/intake/continue (it is returned in the response when the file is first created on form selection).

10. Folder Layout (Summary)

domain-chatbot/
  Agent/
    form-depo/           # Form definitions (JSON), one file per form
    intake/              # Intake data: {form_id}_{YYYYMMDD_HHmmss}.json
    main.py              # Reads --FORMS, switches to intake mode
  LLM_full/
    query/
      service_formintake.py   # Main intake service (record_field, submit_form, intake writes)
      service_vector.py      # Unchanged; handles /query, /query/continue (non-intake)
      models_intake.py       # Intake request/response models
      router.py             # Adds /query/intake, /query/intake/continue
  ...
run_agent.py             # Entrypoint; --FORMS in sys.argv passed to Agent.main

form-depo: input (form schemas).
intake: output (partial and final intake data, timestamped, prefixed by form name).
service_formintake.py: main service for intake; does not replace or modify service_vector.py.

11. Detail Design: Files to Add and Modify

This section lists every file to add or modify. The goal is to minimize changes and to avoid breaking existing behaviour: service_vector.py is not modified and continues to handle the existing /query and /query/continue flows. The main service for form intake is service_formintake.py (new): it is used only by the new intake routes and does not replace or import from service_vector.py.

11.1 New Files

File	Purpose
`LLM_full/query/service_formintake.py`	Main form-intake service. Implements: load forms from `FORM_DEPO_PATH`; build form-filling and form-selection instructions; on form selection: create intake file with `form_state: {}`, set `session_start_timestamp`, include it in the response; handle `record_field` and `submit_form` (validation and intake-folder writes); `process_intake_query` and `process_intake_continue`. Uses `INTAKE_PATH` for `{form_id}_{session_start_timestamp}.json`. Does not import from `service_vector.py`. No RAG, no file_search.
`LLM_full/query/models_intake.py`	Intake-specific Pydantic models: `IntakeQueryRequest`, `IntakeContinueRequest`, and any response helpers. `IntakeContinueRequest` must include `session_start_timestamp` (returned when the file is created on form selection). Response shape includes `session_start_timestamp` when an intake file is created. Keeps `LLM_full/query/models.py` unchanged.
`Agent/form-depo/.gitkeep`	Ensures `Agent/form-depo/` exists in the repo.
`Agent/form-depo/contact_form.json` (optional)	Example form definition for development; can be omitted if not needed.
`Agent/intake/.gitkeep`	Ensures `Agent/intake/` exists; actual intake files are created at runtime.

11.2 Modified Files (Minimal, Non-Breaking)

11.2.1 `LLM_full/query/router.py`

Add only; do not change any existing route or handler.
Add import: from .service_formintake import process_intake_query, process_intake_continue
Add import: from .models_intake import IntakeQueryRequest, IntakeContinueRequest (or the exact model names used).
Add two new routes:
- POST /intake → process_intake_query (with IntakeQueryRequest, Request, Depends(get_current_user)). Path under the existing query_router becomes /query/intake.
- POST /intake/continue → process_intake_continue (with IntakeContinueRequest, Request, Depends(get_current_user)). Path becomes /query/intake/continue.
Do not change POST /, POST /stream, POST /continue, POST /continue/stream, or any import from service_vector.

11.2.2 `LLM_full/settings.py`

Add two optional settings only; do not change any existing key or required flag.
- FORM_DEPO_PATH: env_manager.get_str("FORM_DEPO_PATH", required=False, default="")
- INTAKE_PATH: env_manager.get_str("INTAKE_PATH", required=False, default="")
Add corresponding @property or accessors. When the env value is unset or empty, the accessor returns the default: <project_root>/Agent/form-depo (resp. <project_root>/Agent/intake), where project_root is the domain-chatbot root (e.g. derived from __file__ or cwd). When the env is set, use that value. Set FORM_DEPO_PATH or INTAKE_PATH only to override the default.

11.2.3 `Agent/main.py`

At startup, before create_websocket_server() and create_webrtc_runner(): set INTAKE_MODE = "--FORMS" in sys.argv (or equivalent).
Pass intake_mode=INTAKE_MODE into create_websocket_server(...) and create_webrtc_runner(...) (or the relevant runner factory). If those functions do not yet accept intake_mode, add an optional parameter intake_mode=False so existing callers remain valid.
Do not change any other startup or runner logic.

11.2.4 `Agent/server/websocket.py`

Add optional parameter intake_mode: bool = False to create_websocket_server(...) (or to the factory that creates the WebSocket app). If the function does not take parameters, introduce an optional intake_mode=False and pass it through to the connection handler that creates the session/bot.
In the connection handler that creates ChatSessionLite (or the object that calls the backend): pass intake_mode=intake_mode into that constructor.
Do not change the rest of the WebSocket protocol or existing non-intake behaviour.

11.2.5 `Agent/core/session.py`

Add optional parameter intake_mode: bool = False to ChatSessionLite.__init__(...). Pass it through to AventoraChatbot(..., intake_mode=intake_mode).
Do not change process_message or other method signatures for non-intake callers; only add an optional argument with default False.

11.2.6 `LLM/aventora_chatbot_d.py`

Add optional parameter intake_mode: bool = False to AventoraChatbot.__init__(...). Store it on the instance.
Where the bot calls the API (query_info, query_info_stream, continue/continue_stream, or equivalent): branch on intake_mode. If True, call the intake API (e.g. /query/intake and /query/intake/continue) instead of /query and /query/continue. Use the same auth and session pattern as today; only the URL path (and possibly request body shape for intake) changes.
Do not change behaviour when intake_mode is False.

11.2.7 `LLM/api_client_openai_d.py`

Add new methods (do not change existing query_info, query_info_stream, continue, etc.):
- query_intake_info(...) — POST {BACKEND_URL}/query/intake with intake-specific payload (e.g. name, email, question, language, form_id?, form_state?, session? as needed by models_intake).
- continue_intake(...) — POST {BACKEND_URL}/query/intake/continue with IntakeContinueRequest-style payload.
Optionally add streaming variants if needed later; the design can assume non-stream intake first.
AventoraChatbot when intake_mode=True calls query_intake_info and continue_intake instead of query_info and continue.

11.3 Files Explicitly Not Modified

File	Reason
`LLM_full/query/service_vector.py`	Remains the handler for `POST /query` and `POST /query/continue` (and their stream variants). Intake uses `service_formintake.py` only. No imports from `service_vector` in `service_formintake`; no changes to `service_vector`.
`LLM_full/query/models.py`	`QueryRequest` and `ContinueRequest` stay unchanged. Intake models live in `models_intake.py`.
`LLM_full/main.py`	The new intake routes are on the existing `query_router` (`/query/intake`, `/query/intake/continue`). No new router or `include_router` change.
`run_agent.py`	`sys.argv` already contains `--FORMS` when run as `python run_agent.py --FORMS`; `Agent.main.main()` reads it. No change.

11.4 Optional or Follow-On Wiring

Pipecat / WebRTC: If the Pipecat pipeline (or WebRTC runner) uses a different code path to call the backend (e.g. Agent/audio/processors.py, Agent/server/webrtc.py), that path must also use /query/intake and /query/intake/continue when intake_mode is True. The same intake_mode flag should be passed from main() into that runner/processor. The exact file(s) to modify depend on where the HTTP call is made; the design assumes it is either shared with ChatSessionLite/AventoraChatbot or receives intake_mode from the same place.
Streaming for intake: service_formintake can implement process_intake_query_stream and process_intake_continue_stream later; the router can add POST /intake/stream and POST /intake/continue/stream when needed. Not required for the first version.

11.5 Summary Table

Action	File
Add	`LLM_full/query/service_formintake.py`
Add	`LLM_full/query/models_intake.py`
Add	`Agent/form-depo/.gitkeep` (and optionally `contact_form.json`)
Add	`Agent/intake/.gitkeep`
Modify	`LLM_full/query/router.py` (new routes and imports only)
Modify	`LLM_full/settings.py` (add `FORM_DEPO_PATH`, `INTAKE_PATH`)
Modify	`Agent/main.py` (`--FORMS`, `intake_mode`, pass to runners)
Modify	`Agent/server/websocket.py` (`intake_mode` param, pass to session)
Modify	`Agent/core/session.py` (`intake_mode` param, pass to bot)
Modify	`LLM/aventora_chatbot_d.py` (`intake_mode`, use intake API when True)
Modify	`LLM/api_client_openai_d.py` (new `query_intake_info`, `continue_intake`)
Do not modify	`LLM_full/query/service_vector.py`, `LLM_full/query/models.py`, `LLM_full/main.py`, `run_agent.py`

12. Resolved Open Points (All Decided)

Topic	Question / decision
~~Form-depo path~~	Decided: When `FORM_DEPO_PATH` unset, use `<project_root>/Agent/form-depo`; set env to override.
~~Intake path~~	Decided: When `INTAKE_PATH` unset, use `<project_root>/Agent/intake`; set env to override.
~~session_start_timestamp~~	Decided: Create the file and set `session_start_timestamp` on form selection (before the first `record_field`). Response includes `session_start_timestamp`; client must send it on every continue.
~~Write frequency~~	Decided: One write per user turn — after all `record_field` and any `submit_form` for that turn. `record_field` only updates `form_state`; the orchestrator writes once at end of turn. `submit_form` does its own final write when it runs.
~~Batch `record_field`~~	Decided: Multiple `record_field` per message allowed; one write at end of the turn (orchestrator).
~~Corrections~~	Decided: Overwrite via `record_field` only. No `update_field` tool. For corrections, call `record_field` with the same `field_id` and the new value; the handler overwrites `form_state[field_id]`.
~~Cancel~~	Decided: Natural language + client. No `cancel_form` tool. LLM recognizes "cancel"/"never mind" and acknowledges. Client clears `form_id`, `form_state`, and `session_start_timestamp`. Intake file stays as last partial write (no `completed_at`).
~~Optional skip~~	Decided: Store `null` in `form_state`. When the user skips an optional field, set `form_state[field_id] = null`. Use `record_field(field_id, null)`; handler accepts `null` for optional fields only. Downstream can distinguish "skipped" vs "never asked"; consumers must handle `null`.
~~Progress~~	Decided: Leave to the model. Do not inject "field 3 of 8" or similar. Provide `form_schema`, `form_state`, and instructions to ask the next required empty field; the model may infer and mention progress from `form_state` and `fields` if it chooses.
~~Backend in intake mode~~	Decided: Agent calls domain-chatbot `/query/intake` and `/query/intake/continue`. In `--FORMS` mode the Agent (WebSocket, Pipecat, etc.) calls these instead of `/query` and `/query/continue`. `service_formintake` in domain-chatbot runs the LLM, tools, and intake writes. No local/simplified LLM loop in the Agent for intake.

13. Example End-to-End (Multiple Forms, Partial Data)

1) User starts, two forms in form-depo

User: “I want to send a message.”
Bot uses intent_instructions; both “Contact” and “Feedback” match. Bot: “Do you want the Contact form or the Feedback form?” (using short_label).
User: “Contact.”
Bot loads contact_form, creates Agent/intake/contact_form_20250125_143022.json with form_id, started_at, updated_at, form_state: {}. The response includes session_start_timestamp (e.g. 20250125_143022) so the client can send it on continue. Says introduction and: “What is your full name?”

2) User gives name

User: “John Doe.”
Model calls record_field("full_name","John Doe"). Handler updates form_state = { "full_name": "John Doe" } (does not write). Orchestrator writes once at end of turn to contact_form_20250125_143022.json (partial, no completed_at).
Bot: “Thanks. What is your email?”

3) User gives email and message

User: “j@example.com. My message is: I have a question.”
Model calls record_field("email","j@example.com") then record_field("message","I have a question."). Handler updates form_state for each (does not write); form_state = all three. Model calls submit_form; its handler adds completed_at and writes the final object to contact_form_20250125_143022.json (the write for this turn; no separate orchestrator write when submit_form runs).
Bot: “Form submitted. Thank you.”

At every step after the first record_field, Agent/intake/contact_form_20250125_143022.json contains up-to-date partial (or finally complete) data.

Add clarifications and decisions in the document as you refine.

1. Overview​

1.1 Purpose​

1.2 Design Principles​

1.3 Out of Scope (First Version)​

2. Form Repository (form-depo) and Intake Mode​

2.1 Form-depo Folder​

2.2 Intake Mode and --FORMS​

2.3 Single Form vs Multiple Forms​

3. Form Schema (JSON)​

3.1 Goals​

3.2 Proposed Structure​

3.3 Intent and Form Selection​

3.4 Field Types​

3.5 Field Order and Required vs Optional​

3.6 Localization​

4. Intake Storage (intake folder, partial data)​

4.1 Intake Folder​

4.2 File Naming and Lifecycle​

4.3 File Contents (Partial and Final)​

4.4 When We Write​

5. User Stories and Requirements​

5.1 Happy Path (Single Form)​

5.2 Form Selection (Multiple Forms)​

5.3 Multi-Turn and Continue​

5.4 Multi-Lingual and Voice​

5.5 Validation, Skip, and Errors​

5.6 Abandonment and Edge Cases​

6. Bot Behaviour (System Prompt / Instructions)​

6.1 Role and Constraints​

6.2 Form and State​

6.3 Form Selection (Multiple Forms)​

6.4 First Turn for a Chosen Form (Empty form_state)​

6.5 When the User Provides a Value​

6.6 When All Required (and Desired Optional) Are Filled​

6.7 Correction, Skip, Cancel, and Chitchat​

7. Tools: record_field and submit_form​

7.1 record_field​

7.2 submit_form​

7.3 Where These Tools Live​

8. Intake Mode: Agent and Startup​

8.1 Entrypoint​

8.2 When --FORMS Is Not Set​

9. Request and Response (Intake Mode)​

9.1 Form Selection Phase​

9.2 Field Collection Phase​

9.3 Session and Continuation​

10. Folder Layout (Summary)​

11. Detail Design: Files to Add and Modify​

11.1 New Files​

11.2 Modified Files (Minimal, Non-Breaking)​

11.2.1 LLM_full/query/router.py​

11.2.2 LLM_full/settings.py​

11.2.3 Agent/main.py​

11.2.4 Agent/server/websocket.py​

11.2.5 Agent/core/session.py​

11.2.6 LLM/aventora_chatbot_d.py​

11.2.7 LLM/api_client_openai_d.py​

11.3 Files Explicitly Not Modified​

11.4 Optional or Follow-On Wiring​

11.5 Summary Table​

12. Resolved Open Points (All Decided)​

13. Example End-to-End (Multiple Forms, Partial Data)​