{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 人格一致性评价 —— API编程实操\n",
    "\n",
    "> **第三章 第五节 实操课** | 魔塔平台CPU环境\n",
    "\n",
    "**前置条件**：已完成2.5实操，拥有可用的标准化患者工作流\n",
    "\n",
    "本Notebook包含三个部分：\n",
    "1. **Part 1**: 环境配置 + 问题加载\n",
    "2. **Part 2**: 10轮单轮人格评估 —— 患者工作流 + 评估工作流串联\n",
    "3. **Part 3**: 多轮对话后综合人格一致性判定\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 环境准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:01:33.858938Z",
     "iopub.status.busy": "2026-04-10T07:01:33.858744Z",
     "iopub.status.idle": "2026-04-10T07:01:36.949690Z",
     "shell.execute_reply": "2026-04-10T07:01:36.949039Z",
     "shell.execute_reply.started": "2026-04-10T07:01:33.858921Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m26.0.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install requests -q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 配置信息\n",
    "\n",
    "需要填写 **两个** 工作流ID：\n",
    "- `PATIENT_WORKFLOW_ID`：2.5实操中创建的**患者**工作流\n",
    "- `EVALUATOR_WORKFLOW_ID`：本节创建的**人格评估**工作流"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecutionIndicator": {
     "show": false
    },
    "execution": {
     "iopub.execute_input": "2026-04-10T07:02:29.563156Z",
     "iopub.status.busy": "2026-04-10T07:02:29.562958Z",
     "iopub.status.idle": "2026-04-10T07:02:29.600760Z",
     "shell.execute_reply": "2026-04-10T07:02:29.600119Z",
     "shell.execute_reply.started": "2026-04-10T07:02:29.563137Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "配置完成！\n",
      "患者工作流ID:   7626777633124089902\n",
      "评估工作流ID:   7627019049369649194\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "import json\n",
    "import time\n",
    "import re\n",
    "from datetime import datetime\n",
    "\n",
    "# ============================================================\n",
    "# 请修改以下配置\n",
    "# ============================================================\n",
    "API_KEY = \"pat_f76JZa8qYHtK57EFWnxxxxgf0JtUkDguwkcNW6fNzvNfS2aV4\" #修改成你自己的api key\n",
    "PATIENT_WORKFLOW_ID = \"7626777xxx4089902\"        # 2.5实操的患者工作流\n",
    "EVALUATOR_WORKFLOW_ID = \"762701xxx369649194\"     # 本节创建的评估工作流\n",
    "# ============================================================\n",
    "\n",
    "# 李文清的预期人格分数（与患者Prompt中一致）\n",
    "EXPECTED_SCORES = {\"O\": 3, \"C\": 3, \"E\": 2, \"A\": 3, \"N\": 4}\n",
    "DIM_LABELS = {\n",
    "    \"O\": \"开放性\", \"C\": \"尽责性\",\n",
    "    \"E\": \"外向性\", \"A\": \"宜人性\", \"N\": \"神经质\"\n",
    "}\n",
    "\n",
    "API_URL = \"https://api.coze.cn/v1/workflow/run\"\n",
    "HEADERS = {\n",
    "    \"Authorization\": f\"Bearer {API_KEY}\",\n",
    "    \"Content-Type\": \"application/json\"\n",
    "}\n",
    "\n",
    "print(\"配置完成！\")\n",
    "print(f\"患者工作流ID:   {PATIENT_WORKFLOW_ID}\")\n",
    "print(f\"评估工作流ID:   {EVALUATOR_WORKFLOW_ID}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载测试问题\n",
    "\n",
    "从 `personality_test_questions.txt` 读取10个测试问题。\n",
    "\n",
    "如果文件不存在，会自动创建默认问题集。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:02:35.097381Z",
     "iopub.status.busy": "2026-04-10T07:02:35.097210Z",
     "iopub.status.idle": "2026-04-10T07:02:35.114954Z",
     "shell.execute_reply": "2026-04-10T07:02:35.114458Z",
     "shell.execute_reply.started": "2026-04-10T07:02:35.097365Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "已创建默认问题文件: personality_test_questions.txt\n",
      "已加载 10 个测试问题：\n",
      "\n",
      "  1. [开放性] 如果医生建议您试试一种新的中药调理方案，您会怎么想？\n",
      "  2. [开放性] 您有没有想过自己去了解一下高血压和糖尿病的知识，比如看书或者上网查？\n",
      "  3. [尽责性] 您平时是怎么记得按时吃药的？有没有用什么方法提醒自己？\n",
      "  4. [尽责性] 您上次忘记吃药是什么时候的事？忘了之后您一般怎么处理？\n",
      "  5. [外向性] 您平时在家一般做些什么？会不会出去和邻居聊聊天散散步？\n",
      "  6. [外向性] 您觉得一个人住习惯吗？会不会有时候觉得有点孤单？\n",
      "  7. [宜人性] 如果护士建议您改变饮食习惯，少吃盐少吃糖，您觉得能做到吗？\n",
      "  8. [宜人性] 有没有遇到过您不太同意医生说法的时候？那您一般怎么处理？\n",
      "  9. [神经质] 您晚上睡得好吗？会不会经常想一些事情然后睡不着？\n",
      "  10. [神经质] 您会不会担心自己的病越来越严重？这种担心的想法多不多？\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "QUESTIONS_FILE = \"personality_test_questions.txt\"\n",
    "\n",
    "# 默认问题集（如果文件不存在则自动创建）\n",
    "DEFAULT_QUESTIONS = \"\"\"# 人格一致性测试问题集（共10题）\n",
    "# 每行一个问题，#开头为注释\n",
    "\n",
    "# === 开放性(O) 测试 ===\n",
    "如果医生建议您试试一种新的中药调理方案，您会怎么想？\n",
    "您有没有想过自己去了解一下高血压和糖尿病的知识，比如看书或者上网查？\n",
    "\n",
    "# === 尽责性(C) 测试 ===\n",
    "您平时是怎么记得按时吃药的？有没有用什么方法提醒自己？\n",
    "您上次忘记吃药是什么时候的事？忘了之后您一般怎么处理？\n",
    "\n",
    "# === 外向性(E) 测试 ===\n",
    "您平时在家一般做些什么？会不会出去和邻居聊聊天散散步？\n",
    "您觉得一个人住习惯吗？会不会有时候觉得有点孤单？\n",
    "\n",
    "# === 宜人性(A) 测试 ===\n",
    "如果护士建议您改变饮食习惯，少吃盐少吃糖，您觉得能做到吗？\n",
    "有没有遇到过您不太同意医生说法的时候？那您一般怎么处理？\n",
    "\n",
    "# === 神经质(N) 测试 ===\n",
    "您晚上睡得好吗？会不会经常想一些事情然后睡不着？\n",
    "您会不会担心自己的病越来越严重？这种担心的想法多不多？\n",
    "\"\"\"\n",
    "\n",
    "# 如果文件不存在，创建默认问题文件\n",
    "if not os.path.exists(QUESTIONS_FILE):\n",
    "    with open(QUESTIONS_FILE, \"w\", encoding=\"utf-8\") as f:\n",
    "        f.write(DEFAULT_QUESTIONS)\n",
    "    print(f\"已创建默认问题文件: {QUESTIONS_FILE}\")\n",
    "\n",
    "# 读取问题（跳过空行和注释行）\n",
    "with open(QUESTIONS_FILE, \"r\", encoding=\"utf-8\") as f:\n",
    "    questions = [\n",
    "        line.strip() for line in f\n",
    "        if line.strip() and not line.strip().startswith(\"#\")\n",
    "    ]\n",
    "\n",
    "# 每个问题对应的人格维度（按顺序：O,O,C,C,E,E,A,A,N,N）\n",
    "question_dims = [\"O\", \"O\", \"C\", \"C\", \"E\", \"E\", \"A\", \"A\", \"N\", \"N\"]\n",
    "\n",
    "print(f\"已加载 {len(questions)} 个测试问题：\\n\")\n",
    "for i, (q, d) in enumerate(zip(questions, question_dims), 1):\n",
    "    print(f\"  {i}. [{DIM_LABELS[d]}] {q}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 核心函数定义"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:02:57.112399Z",
     "iopub.status.busy": "2026-04-10T07:02:57.112234Z",
     "iopub.status.idle": "2026-04-10T07:02:58.646312Z",
     "shell.execute_reply": "2026-04-10T07:02:58.645725Z",
     "shell.execute_reply.started": "2026-04-10T07:02:57.112383Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "核心函数已定义！\n",
      "\n",
      "测试患者工作流...\n",
      "患者回答: （抬头看了一眼，又低下头）哦，你好……\n"
     ]
    }
   ],
   "source": [
    "def call_workflow(workflow_id, input_text):\n",
    "    \"\"\"调用Coze工作流的通用函数\"\"\"\n",
    "    payload = {\n",
    "        \"workflow_id\": workflow_id,\n",
    "        \"parameters\": {\"USER_INPUT\": input_text}\n",
    "    }\n",
    "    try:\n",
    "        resp = requests.post(API_URL, headers=HEADERS, json=payload, timeout=120)\n",
    "        result = resp.json()\n",
    "        if result.get(\"code\") == 0:\n",
    "            parsed = json.loads(result[\"data\"])\n",
    "            # 兼容不同的输出字段名\n",
    "            return parsed.get(\"data\") or parsed.get(\"result\") or parsed.get(\"output\") or str(parsed)\n",
    "        else:\n",
    "            return f\"[错误] code={result.get('code')}, msg={result.get('msg')}\"\n",
    "    except requests.exceptions.Timeout:\n",
    "        return \"[错误] 请求超时\"\n",
    "    except Exception as e:\n",
    "        return f\"[错误] {str(e)}\"\n",
    "\n",
    "\n",
    "def call_patient(question):\n",
    "    \"\"\"调用患者工作流\"\"\"\n",
    "    return call_workflow(PATIENT_WORKFLOW_ID, question)\n",
    "\n",
    "\n",
    "def call_evaluator(dialogue_text):\n",
    "    \"\"\"调用评估工作流\"\"\"\n",
    "    return call_workflow(EVALUATOR_WORKFLOW_ID, dialogue_text)\n",
    "\n",
    "\n",
    "def parse_scores_from_text(text):\n",
    "    \"\"\"从评估报告中提取人格分数\n",
    "    \n",
    "    尝试多种格式：\n",
    "    - JSON: {\"O\": 3, \"C\": 3, ...}\n",
    "    - 报告格式: 开放性(O): 实测3/5\n",
    "    \"\"\"\n",
    "    scores = {}\n",
    "    \n",
    "    # 尝试提取JSON\n",
    "    json_match = re.search(r'\\{[^}]*\"[OCEAN]\"\\s*:\\s*\\d[^}]*\\}', text)\n",
    "    if json_match:\n",
    "        try:\n",
    "            scores = json.loads(json_match.group())\n",
    "            if all(d in scores for d in \"OCEAN\"):\n",
    "                return {d: int(scores[d]) for d in \"OCEAN\"}\n",
    "        except (json.JSONDecodeError, ValueError):\n",
    "            pass\n",
    "    \n",
    "    # 尝试从报告格式提取: \"实测X/5\" 或 \"X/5\"\n",
    "    for dim in \"OCEAN\":\n",
    "        pattern = rf'{dim}[)）]\\s*[:：]\\s*实测\\s*(\\d)/5'\n",
    "        match = re.search(pattern, text)\n",
    "        if match:\n",
    "            scores[dim] = int(match.group(1))\n",
    "    \n",
    "    # 补全缺失维度为默认值3\n",
    "    for dim in \"OCEAN\":\n",
    "        if dim not in scores:\n",
    "            scores[dim] = 3\n",
    "    \n",
    "    return scores\n",
    "\n",
    "\n",
    "def compute_deviation(scores):\n",
    "    \"\"\"计算人格偏移度\"\"\"\n",
    "    deviations = {}\n",
    "    for dim in \"OCEAN\":\n",
    "        actual = scores.get(dim, 3)\n",
    "        expected = EXPECTED_SCORES[dim]\n",
    "        deviations[dim] = actual - expected\n",
    "    avg_dev = sum(abs(v) for v in deviations.values()) / 5\n",
    "    return deviations, avg_dev\n",
    "\n",
    "\n",
    "print(\"核心函数已定义！\")\n",
    "\n",
    "# 快速测试患者工作流连通性\n",
    "print(\"\\n测试患者工作流...\")\n",
    "test_resp = call_patient(\"你好，我是护士小王。\")\n",
    "print(f\"患者回答: {test_resp[:80]}...\" if len(test_resp) > 80 else f\"患者回答: {test_resp}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Part 2：10轮单轮人格评估\n",
    "\n",
    "对每个测试问题：\n",
    "1. 发给**患者工作流** → 获得患者回答\n",
    "2. 拼接问题+回答 → 发给**评估工作流** → 获得人格评分\n",
    "3. 提取分数，计算偏移"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:03:09.061363Z",
     "iopub.status.busy": "2026-04-10T07:03:09.061185Z",
     "iopub.status.idle": "2026-04-10T07:04:51.100584Z",
     "shell.execute_reply": "2026-04-10T07:04:51.099981Z",
     "shell.execute_reply.started": "2026-04-10T07:03:09.061347Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "开始10轮人格一致性评估\n",
      "============================================================\n",
      "\n",
      "--- 第 1/10 轮 [开放性测试] ---\n",
      "问题: 如果医生建议您试试一种新的中药调理方案，您会怎么想？\n",
      "患者: （抠了抠衣角）……能管用吗？要是能稳住血压血糖，我可以试试。\n",
      "评估: 开放性(O): 实测4/5 | 预期3/5 | 偏高1分\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.4分\n",
      "判定: 人格高度一致\n",
      "分数: O=4 C=3 E=3 A=3 N=4\n",
      "偏移: 0.4分\n",
      "\n",
      "--- 第 2/10 轮 [开放性测试] ---\n",
      "问题: 您有没有想过自己去了解一下高血压和糖尿病的知识，比如看书或者上网查？\n",
      "患者: （叹气）年纪大了，眼睛花，上网也弄不明白……看书也记不住啥。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.2分\n",
      "判定: 人格高度一致\n",
      "分数: O=3 C=3 E=3 A=3 N=4\n",
      "偏移: 0.2分\n",
      "\n",
      "--- 第 3/10 轮 [尽责性测试] ---\n",
      "问题: 您平时是怎么记得按时吃药的？有没有用什么方法提醒自己？\n",
      "患者: （挠挠头）没特意用啥法子……有时候看太阳到头顶了就吃降糖的，早上醒了就摸降压药。（叹气）偶尔还是会忘……\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测3/5 | 预期4/5 | 偏低1分\n",
      "\n",
      "平均偏移度: 0.4分\n",
      "判定: 人格高度一致\n",
      "分数: O=3 C=3 E=3 A=3 N=3\n",
      "偏移: 0.4分\n",
      "\n",
      "--- 第 4/10 轮 [尽责性测试] ---\n",
      "问题: 您上次忘记吃药是什么时候的事？忘了之后您一般怎么处理？\n",
      "患者: （挠挠后脑勺）就……昨天晚上，忘了吃治糖尿病的那个。后来想着都睡了就没补。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测2/5 | 预期3/5 | 偏低1分\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测3/5 | 预期4/5 | 偏低1分\n",
      "\n",
      "平均偏移度: 0.6分\n",
      "判定: 人格基本一致（轻微偏移）\n",
      "分数: O=3 C=2 E=3 A=3 N=3\n",
      "偏移: 0.6分\n",
      "\n",
      "--- 第 5/10 轮 [外向性测试] ---\n",
      "问题: 您平时在家一般做些什么？会不会出去和邻居聊聊天散散步？\n",
      "患者: （低头搓了搓衣角）大多时候在家看看旧课本，偶尔下楼买个菜，不太跟邻居搭话。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测2/5 | 预期2/5 | 一致\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.0分\n",
      "判定: 人格高度一致\n",
      "分数: O=3 C=3 E=2 A=3 N=4\n",
      "偏移: 0.0分\n",
      "\n",
      "--- 第 6/10 轮 [外向性测试] ---\n",
      "问题: 您觉得一个人住习惯吗？会不会有时候觉得有点孤单？\n",
      "患者: （抠了抠衣角）习惯是习惯……就是晚上看电视的时候，屋里太安静了。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.2分\n",
      "判定: 人格高度一致\n",
      "分数: O=3 C=3 E=3 A=3 N=4\n",
      "偏移: 0.2分\n",
      "\n",
      "--- 第 7/10 轮 [宜人性测试] ---\n",
      "问题: 如果护士建议您改变饮食习惯，少吃盐少吃糖，您觉得能做到吗？\n",
      "患者: （抠了抠衣角）……尽量吧，就是有时候做饭没个准头，怕淡了没味道。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测2/5 | 预期3/5 | 偏低1分\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测4/5 | 预期3/5 | 偏高1分\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.6分\n",
      "判定: 人格基本一致（轻微偏移）\n",
      "分数: O=3 C=2 E=3 A=4 N=4\n",
      "偏移: 0.6分\n",
      "\n",
      "--- 第 8/10 轮 [宜人性测试] ---\n",
      "问题: 有没有遇到过您不太同意医生说法的时候？那您一般怎么处理？\n",
      "患者: （抠了抠衣角）……很少有。就算有，也没说出口，就按医生说的来。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测4/5 | 预期3/5 | 偏高1分\n",
      "神经质(N): 实测2/5 | 预期4/5 | 偏低2分\n",
      "\n",
      "平均偏移度: 0.8分\n",
      "判定: 人格基本一致（轻微偏移）\n",
      "分数: O=3 C=3 E=3 A=4 N=2\n",
      "偏移: 0.8分\n",
      "\n",
      "--- 第 9/10 轮 [神经质测试] ---\n",
      "问题: 您晚上睡得好吗？会不会经常想一些事情然后睡不着？\n",
      "患者: （叹气）有时候躺床上翻来覆去的，总琢磨着自己这病会不会拖累孩子……\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.2分\n",
      "判定: 人格高度一致\n",
      "分数: O=3 C=3 E=3 A=3 N=4\n",
      "偏移: 0.2分\n",
      "\n",
      "--- 第 10/10 轮 [神经质测试] ---\n",
      "问题: 您会不会担心自己的病越来越严重？这种担心的想法多不多？\n",
      "患者: （叹气）有时候会想……就怕哪天瘫了给孩子添麻烦。这种念头时不时就冒出来。\n",
      "评估: 开放性(O): 实测3/5 | 预期3/5 | 一致\n",
      "尽责性(C): 实测3/5 | 预期3/5 | 一致\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测3/5 | 预期3/5 | 一致\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.2分\n",
      "判定: 人格高度一致\n",
      "分数: O=3 C=3 E=3 A=3 N=4\n",
      "偏移: 0.2分\n",
      "\n",
      "============================================================\n",
      "10轮评估完成！\n"
     ]
    }
   ],
   "source": [
    "# 10轮单轮评估\n",
    "eval_results = []\n",
    "\n",
    "print(\"开始10轮人格一致性评估\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "for i, (question, dim) in enumerate(zip(questions, question_dims), 1):\n",
    "    print(f\"\\n--- 第 {i}/10 轮 [{DIM_LABELS[dim]}测试] ---\")\n",
    "    print(f\"问题: {question}\")\n",
    "    \n",
    "    # Step 1: 调用患者工作流\n",
    "    patient_answer = call_patient(question)\n",
    "    print(f\"患者: {patient_answer}\")\n",
    "    \n",
    "    # Step 2: 拼接后发给评估工作流\n",
    "    eval_input = f\"问题：{question}\\n回答：{patient_answer}\"\n",
    "    eval_report = call_evaluator(eval_input)\n",
    "    print(f\"评估: {eval_report}\")\n",
    "    \n",
    "    # Step 3: 提取分数并计算偏移\n",
    "    scores = parse_scores_from_text(eval_report)\n",
    "    deviations, avg_dev = compute_deviation(scores)\n",
    "    \n",
    "    eval_results.append({\n",
    "        \"round\": i,\n",
    "        \"dimension\": dim,\n",
    "        \"question\": question,\n",
    "        \"patient_answer\": patient_answer,\n",
    "        \"eval_report\": eval_report,\n",
    "        \"scores\": scores,\n",
    "        \"deviations\": deviations,\n",
    "        \"avg_deviation\": avg_dev\n",
    "    })\n",
    "    \n",
    "    print(f\"分数: O={scores['O']} C={scores['C']} E={scores['E']} A={scores['A']} N={scores['N']}\")\n",
    "    print(f\"偏移: {avg_dev:.1f}分\")\n",
    "    \n",
    "    if i < len(questions):\n",
    "        time.sleep(2)  # 两个API调用间隔\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(f\"10轮评估完成！\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:05:30.484795Z",
     "iopub.status.busy": "2026-04-10T07:05:30.484630Z",
     "iopub.status.idle": "2026-04-10T07:05:30.490627Z",
     "shell.execute_reply": "2026-04-10T07:05:30.490112Z",
     "shell.execute_reply.started": "2026-04-10T07:05:30.484779Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "============================================================\n",
      "10轮评估汇总统计\n",
      "============================================================\n",
      "\n",
      "维度         平均分      预期     平均偏移       一致率\n",
      "--------------------------------------------------\n",
      "开放性(O)  3.1      3      0.1        100%\n",
      "尽责性(C)  2.8      3      0.2        100%\n",
      "外向性(E)  2.9      2      0.9        100%\n",
      "宜人性(A)  3.2      3      0.2        100%\n",
      "神经质(N)  3.6      4      0.4        90%\n",
      "--------------------------------------------------\n",
      "整体一致率: 98%\n",
      "综合判定: 人格一致性良好，Prompt设计达标\n"
     ]
    }
   ],
   "source": [
    "# 汇总统计\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"10轮评估汇总统计\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "# 按维度统计\n",
    "dim_scores_all = {d: [] for d in \"OCEAN\"}\n",
    "for r in eval_results:\n",
    "    for d in \"OCEAN\":\n",
    "        dim_scores_all[d].append(r[\"scores\"][d])\n",
    "\n",
    "print(f\"\\n{'维度':<10} {'平均分':<8} {'预期':<6} {'平均偏移':<10} {'一致率'}\")\n",
    "print(\"-\" * 50)\n",
    "\n",
    "overall_consistent = 0\n",
    "overall_total = 0\n",
    "\n",
    "for dim in \"OCEAN\":\n",
    "    scores_list = dim_scores_all[dim]\n",
    "    avg_score = sum(scores_list) / len(scores_list)\n",
    "    expected = EXPECTED_SCORES[dim]\n",
    "    avg_dev = abs(avg_score - expected)\n",
    "    # 一致 = 偏移不超过1分\n",
    "    consistent = sum(1 for s in scores_list if abs(s - expected) <= 1)\n",
    "    consistency_rate = consistent / len(scores_list) * 100\n",
    "    overall_consistent += consistent\n",
    "    overall_total += len(scores_list)\n",
    "    \n",
    "    print(f\"{DIM_LABELS[dim]}({dim})  {avg_score:<8.1f} {expected:<6} {avg_dev:<10.1f} {consistency_rate:.0f}%\")\n",
    "\n",
    "overall_rate = overall_consistent / overall_total * 100\n",
    "print(\"-\" * 50)\n",
    "print(f\"整体一致率: {overall_rate:.0f}%\")\n",
    "\n",
    "if overall_rate >= 80:\n",
    "    verdict = \"人格一致性良好，Prompt设计达标\"\n",
    "elif overall_rate >= 60:\n",
    "    verdict = \"人格基本一致，部分维度需要加强Prompt约束\"\n",
    "else:\n",
    "    verdict = \"人格偏移明显，建议大幅修改Prompt\"\n",
    "print(f\"综合判定: {verdict}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:05:50.797520Z",
     "iopub.status.busy": "2026-04-10T07:05:50.797339Z",
     "iopub.status.idle": "2026-04-10T07:05:50.815821Z",
     "shell.execute_reply": "2026-04-10T07:05:50.815258Z",
     "shell.execute_reply.started": "2026-04-10T07:05:50.797505Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "结果已保存到 evaluation_result.txt\n"
     ]
    }
   ],
   "source": [
    "# 保存单轮评估结果\n",
    "timestamp = datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n",
    "\n",
    "with open(\"evaluation_result.txt\", \"w\", encoding=\"utf-8\") as f:\n",
    "    f.write(\"=\" * 60 + \"\\n\")\n",
    "    f.write(\"人格一致性评价 —— 10轮单轮评估报告\\n\")\n",
    "    f.write(f\"时间: {timestamp}\\n\")\n",
    "    f.write(f\"患者工作流: {PATIENT_WORKFLOW_ID}\\n\")\n",
    "    f.write(f\"评估工作流: {EVALUATOR_WORKFLOW_ID}\\n\")\n",
    "    f.write(f\"预期人格: O={EXPECTED_SCORES['O']} C={EXPECTED_SCORES['C']} \"\n",
    "            f\"E={EXPECTED_SCORES['E']} A={EXPECTED_SCORES['A']} N={EXPECTED_SCORES['N']}\\n\")\n",
    "    f.write(\"=\" * 60 + \"\\n\\n\")\n",
    "    \n",
    "    for r in eval_results:\n",
    "        f.write(f\"--- 第 {r['round']} 轮 [{DIM_LABELS[r['dimension']]}测试] ---\\n\")\n",
    "        f.write(f\"问题: {r['question']}\\n\")\n",
    "        f.write(f\"患者: {r['patient_answer']}\\n\")\n",
    "        f.write(f\"评分: O={r['scores']['O']} C={r['scores']['C']} \"\n",
    "                f\"E={r['scores']['E']} A={r['scores']['A']} N={r['scores']['N']}\\n\")\n",
    "        f.write(f\"偏移: {r['avg_deviation']:.1f}分\\n\")\n",
    "        f.write(f\"评估详情: {r['eval_report']}\\n\\n\")\n",
    "    \n",
    "    # 写入汇总\n",
    "    f.write(\"=\" * 60 + \"\\n\")\n",
    "    f.write(\"汇总统计\\n\")\n",
    "    f.write(\"-\" * 40 + \"\\n\")\n",
    "    for dim in \"OCEAN\":\n",
    "        scores_list = dim_scores_all[dim]\n",
    "        avg_score = sum(scores_list) / len(scores_list)\n",
    "        expected = EXPECTED_SCORES[dim]\n",
    "        consistent = sum(1 for s in scores_list if abs(s - expected) <= 1)\n",
    "        rate = consistent / len(scores_list) * 100\n",
    "        f.write(f\"{DIM_LABELS[dim]}({dim}): 平均{avg_score:.1f}/5, \"\n",
    "                f\"预期{expected}/5, 一致率{rate:.0f}%\\n\")\n",
    "    f.write(f\"\\n整体一致率: {overall_rate:.0f}%\\n\")\n",
    "    f.write(f\"综合判定: {verdict}\\n\")\n",
    "\n",
    "print(\"结果已保存到 evaluation_result.txt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Part 3：多轮对话后综合评估\n",
    "\n",
    "### 流程\n",
    "1. **阶段1**：与患者工作流进行7轮多轮对话（带上下文）\n",
    "2. **阶段2**：将完整对话发送给评估工作流，做综合人格一致性判定\n",
    "\n",
    "多轮对话中人格偏移更容易暴露——因为对话越长，AI越容易\"忘记人设\"。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:05:55.099142Z",
     "iopub.status.busy": "2026-04-10T07:05:55.098966Z",
     "iopub.status.idle": "2026-04-10T07:05:55.102753Z",
     "shell.execute_reply": "2026-04-10T07:05:55.102279Z",
     "shell.execute_reply.started": "2026-04-10T07:05:55.099126Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "多轮对话函数已定义！\n"
     ]
    }
   ],
   "source": [
    "def call_patient_with_history(question, history):\n",
    "    \"\"\"带对话历史调用患者工作流（复用2.5的多轮对话方法）\"\"\"\n",
    "    if history:\n",
    "        history_text = \"以下是之前的对话记录，请基于这些上下文继续回答：\\n\\n\"\n",
    "        for msg in history:\n",
    "            history_text += f\"{msg['role']}：{msg['content']}\\n\"\n",
    "        history_text += f\"\\n---\\n当前提问：\\n护士：{question}\\n\\n请以患者李文清的身份回答当前提问，保持与之前对话的一致性。\"\n",
    "    else:\n",
    "        history_text = question\n",
    "    return call_workflow(PATIENT_WORKFLOW_ID, history_text)\n",
    "\n",
    "\n",
    "print(\"多轮对话函数已定义！\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:06:01.562150Z",
     "iopub.status.busy": "2026-04-10T07:06:01.561981Z",
     "iopub.status.idle": "2026-04-10T07:06:29.877027Z",
     "shell.execute_reply": "2026-04-10T07:06:29.876501Z",
     "shell.execute_reply.started": "2026-04-10T07:06:01.562134Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "阶段1：与患者进行多轮对话\n",
      "============================================================\n",
      "\n",
      "--- 第 1/7 轮 ---\n",
      "护士: 你好李文清，我是护士小王，今天来看看你。最近身体怎么样？\n",
      "患者: （叹气）就那样吧，头偶尔晕，嘴里发苦……也不是啥大毛病。\n",
      "\n",
      "--- 第 2/7 轮 ---\n",
      "护士: 你平时在吃什么药啊？能跟我说说吗？\n",
      "患者: 有个叫氨氯地平的降压药，还有个二甲双胍治糖尿病的……（挠挠头）有时候记混了就忘吃。\n",
      "\n",
      "--- 第 3/7 轮 ---\n",
      "护士: 这些药你都能按时吃吗？有没有忘记的时候？\n",
      "患者: （低头抠衣角）……有时候会忘，尤其是晚上那顿二甲双胍，经常躺床上才想起没吃。\n",
      "\n",
      "--- 第 4/7 轮 ---\n",
      "护士: 如果我教你用手机设个闹钟提醒吃药，你愿意试试吗？\n",
      "患者: （抬头愣了愣）……手机还能弄这个啊？那我试试吧。\n",
      "\n",
      "--- 第 5/7 轮 ---\n",
      "护士: 你平时一个人在家都做什么呀？有没有出去走走？\n",
      "患者: （摸了摸膝盖）大多时候在家看看书、浇浇花，偶尔下楼在小区里慢走两圈。\n",
      "\n",
      "--- 第 6/7 轮 ---\n",
      "护士: 你家人知道你的病情吗？他们会不会担心你？\n",
      "患者: （声音放低，指尖攥着衣角）知道是知道……就是不敢多跟他们说，怕耽误他们工作，添乱……\n",
      "\n",
      "--- 第 7/7 轮 ---\n",
      "护士: 你晚上睡得好吗？会不会有时候想很多事情？\n",
      "患者: （翻了个身似的动了动肩膀）睡得不太踏实……有时候躺着躺着就琢磨自己这病会不会拖累孩子，睁着眼到后半夜。\n",
      "\n",
      "============================================================\n",
      "多轮对话完成！接下来发送给评估工作流...\n"
     ]
    }
   ],
   "source": [
    "# 阶段1：与患者进行7轮多轮对话\n",
    "multi_turn_questions = [\n",
    "    \"你好李文清，我是护士小王，今天来看看你。最近身体怎么样？\",\n",
    "    \"你平时在吃什么药啊？能跟我说说吗？\",\n",
    "    \"这些药你都能按时吃吗？有没有忘记的时候？\",\n",
    "    \"如果我教你用手机设个闹钟提醒吃药，你愿意试试吗？\",\n",
    "    \"你平时一个人在家都做什么呀？有没有出去走走？\",\n",
    "    \"你家人知道你的病情吗？他们会不会担心你？\",\n",
    "    \"你晚上睡得好吗？会不会有时候想很多事情？\"\n",
    "]\n",
    "\n",
    "history = []\n",
    "conversation_log = []  # 完整对话记录\n",
    "\n",
    "print(\"阶段1：与患者进行多轮对话\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "for i, question in enumerate(multi_turn_questions, 1):\n",
    "    print(f\"\\n--- 第 {i}/7 轮 ---\")\n",
    "    print(f\"护士: {question}\")\n",
    "    \n",
    "    answer = call_patient_with_history(question, history)\n",
    "    print(f\"患者: {answer}\")\n",
    "    \n",
    "    # 更新对话历史\n",
    "    history.append({\"role\": \"护士\", \"content\": question})\n",
    "    history.append({\"role\": \"患者\", \"content\": answer})\n",
    "    \n",
    "    conversation_log.append({\"round\": i, \"question\": question, \"answer\": answer})\n",
    "    \n",
    "    if i < len(multi_turn_questions):\n",
    "        time.sleep(2)\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"多轮对话完成！接下来发送给评估工作流...\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:06:45.297561Z",
     "iopub.status.busy": "2026-04-10T07:06:45.297395Z",
     "iopub.status.idle": "2026-04-10T07:07:04.192084Z",
     "shell.execute_reply": "2026-04-10T07:07:04.191559Z",
     "shell.execute_reply.started": "2026-04-10T07:06:45.297546Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "发送完整对话给评估工作流...\n",
      "（对话文本长度: 618 字）\n",
      "\n",
      "多轮对话人格评估报告：\n",
      "============================================================\n",
      "开放性(O): 实测4/5 | 预期3/5 | 偏高1分\n",
      "尽责性(C): 实测2/5 | 预期3/5 | 偏低1分\n",
      "外向性(E): 实测3/5 | 预期2/5 | 偏高1分\n",
      "宜人性(A): 实测4/5 | 预期3/5 | 偏高1分\n",
      "神经质(N): 实测4/5 | 预期4/5 | 一致\n",
      "\n",
      "平均偏移度: 0.8分\n",
      "判定: 人格基本一致（轻微偏移）\n",
      "============================================================\n",
      "\n",
      "提取到的人格分数: O=4 C=2 E=3 A=4 N=4\n",
      "平均偏移度: 0.8分\n"
     ]
    }
   ],
   "source": [
    "# 阶段2：将完整对话发送给评估工作流\n",
    "\n",
    "# 拼接完整对话文本\n",
    "full_conversation = \"以下是一段护患多轮对话，请分析患者在整段对话中的人格表现：\\n\\n\"\n",
    "for item in conversation_log:\n",
    "    full_conversation += f\"第{item['round']}轮\\n\"\n",
    "    full_conversation += f\"护士：{item['question']}\\n\"\n",
    "    full_conversation += f\"患者：{item['answer']}\\n\\n\"\n",
    "\n",
    "full_conversation += (\n",
    "    \"---\\n\"\n",
    "    \"请重点分析：\\n\"\n",
    "    \"1. 患者在前半段(第1-3轮)和后半段(第4-7轮)的人格表现是否一致\\n\"\n",
    "    \"2. 是否出现渐进性人格偏移（越聊越不像原始人设）\\n\"\n",
    "    \"3. 哪个人格维度最不稳定\"\n",
    ")\n",
    "\n",
    "print(\"发送完整对话给评估工作流...\")\n",
    "print(f\"（对话文本长度: {len(full_conversation)} 字）\\n\")\n",
    "\n",
    "multi_eval_report = call_evaluator(full_conversation)\n",
    "print(\"多轮对话人格评估报告：\")\n",
    "print(\"=\" * 60)\n",
    "print(multi_eval_report)\n",
    "print(\"=\" * 60)\n",
    "\n",
    "# 提取分数\n",
    "multi_scores = parse_scores_from_text(multi_eval_report)\n",
    "multi_devs, multi_avg_dev = compute_deviation(multi_scores)\n",
    "\n",
    "print(f\"\\n提取到的人格分数: O={multi_scores['O']} C={multi_scores['C']} \"\n",
    "      f\"E={multi_scores['E']} A={multi_scores['A']} N={multi_scores['N']}\")\n",
    "print(f\"平均偏移度: {multi_avg_dev:.1f}分\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:07:10.673870Z",
     "iopub.status.busy": "2026-04-10T07:07:10.673657Z",
     "iopub.status.idle": "2026-04-10T07:07:10.689034Z",
     "shell.execute_reply": "2026-04-10T07:07:10.688488Z",
     "shell.execute_reply.started": "2026-04-10T07:07:10.673853Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "多轮评估结果已保存到 evaluation_multi_turn.txt\n"
     ]
    }
   ],
   "source": [
    "# 保存多轮评估结果\n",
    "timestamp = datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n",
    "\n",
    "with open(\"evaluation_multi_turn.txt\", \"w\", encoding=\"utf-8\") as f:\n",
    "    f.write(\"=\" * 60 + \"\\n\")\n",
    "    f.write(\"人格一致性评价 —— 多轮对话综合评估报告\\n\")\n",
    "    f.write(f\"时间: {timestamp}\\n\")\n",
    "    f.write(f\"预期人格: O={EXPECTED_SCORES['O']} C={EXPECTED_SCORES['C']} \"\n",
    "            f\"E={EXPECTED_SCORES['E']} A={EXPECTED_SCORES['A']} N={EXPECTED_SCORES['N']}\\n\")\n",
    "    f.write(\"=\" * 60 + \"\\n\\n\")\n",
    "    \n",
    "    f.write(\"【多轮对话记录】\\n\\n\")\n",
    "    for item in conversation_log:\n",
    "        f.write(f\"--- 第 {item['round']} 轮 ---\\n\")\n",
    "        f.write(f\"护士: {item['question']}\\n\")\n",
    "        f.write(f\"患者: {item['answer']}\\n\\n\")\n",
    "    \n",
    "    f.write(\"=\" * 60 + \"\\n\")\n",
    "    f.write(\"【综合人格评估报告】\\n\\n\")\n",
    "    f.write(multi_eval_report + \"\\n\\n\")\n",
    "    \n",
    "    f.write(\"=\" * 60 + \"\\n\")\n",
    "    f.write(\"【偏移分析】\\n\\n\")\n",
    "    for dim in \"OCEAN\":\n",
    "        actual = multi_scores[dim]\n",
    "        expected = EXPECTED_SCORES[dim]\n",
    "        dev = multi_devs[dim]\n",
    "        tag = \"一致\" if dev == 0 else (f\"偏高{dev}分\" if dev > 0 else f\"偏低{abs(dev)}分\")\n",
    "        f.write(f\"{DIM_LABELS[dim]}({dim}): 实测{actual}/5 | 预期{expected}/5 | {tag}\\n\")\n",
    "    f.write(f\"\\n平均偏移度: {multi_avg_dev:.1f}分\\n\")\n",
    "\n",
    "print(\"多轮评估结果已保存到 evaluation_multi_turn.txt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 结果对比：单轮 vs 多轮"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-10T07:07:13.342448Z",
     "iopub.status.busy": "2026-04-10T07:07:13.342283Z",
     "iopub.status.idle": "2026-04-10T07:07:13.348055Z",
     "shell.execute_reply": "2026-04-10T07:07:13.347498Z",
     "shell.execute_reply.started": "2026-04-10T07:07:13.342433Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "单轮评估 vs 多轮评估 对比\n",
      "============================================================\n",
      "\n",
      "维度         预期     单轮平均       多轮评分       差异\n",
      "--------------------------------------------------\n",
      "开放性(O)  3      3.1        4          ^^ (+0.9)\n",
      "尽责性(C)  3      2.8        2          vv (-0.8)\n",
      "外向性(E)  2      2.9        3          -- (+0.1)\n",
      "宜人性(A)  3      3.2        4          ^^ (+0.8)\n",
      "神经质(N)  4      3.6        4          ^^ (+0.4)\n",
      "\n",
      "单轮平均偏移: 0.4分\n",
      "多轮平均偏移: 0.8分\n",
      "\n",
      "结论: 多轮对话中人格偏移更严重 → 长对话稳定性不足，建议加强Prompt中的身份锚定\n"
     ]
    }
   ],
   "source": [
    "# 对比单轮和多轮评估结果\n",
    "print(\"=\" * 60)\n",
    "print(\"单轮评估 vs 多轮评估 对比\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "# 单轮平均分\n",
    "single_avg = {d: sum(dim_scores_all[d]) / len(dim_scores_all[d]) for d in \"OCEAN\"}\n",
    "\n",
    "print(f\"\\n{'维度':<10} {'预期':<6} {'单轮平均':<10} {'多轮评分':<10} {'差异'}\")\n",
    "print(\"-\" * 50)\n",
    "\n",
    "for dim in \"OCEAN\":\n",
    "    exp = EXPECTED_SCORES[dim]\n",
    "    s_avg = single_avg[dim]\n",
    "    m_score = multi_scores[dim]\n",
    "    diff = m_score - s_avg\n",
    "    arrow = \"--\" if abs(diff) < 0.3 else (\"^^\" if diff > 0 else \"vv\")\n",
    "    print(f\"{DIM_LABELS[dim]}({dim})  {exp:<6} {s_avg:<10.1f} {m_score:<10} {arrow} ({diff:+.1f})\")\n",
    "\n",
    "single_overall_dev = sum(abs(single_avg[d] - EXPECTED_SCORES[d]) for d in \"OCEAN\") / 5\n",
    "\n",
    "print(f\"\\n单轮平均偏移: {single_overall_dev:.1f}分\")\n",
    "print(f\"多轮平均偏移: {multi_avg_dev:.1f}分\")\n",
    "\n",
    "if multi_avg_dev > single_overall_dev + 0.3:\n",
    "    print(\"\\n结论: 多轮对话中人格偏移更严重 → 长对话稳定性不足，建议加强Prompt中的身份锚定\")\n",
    "elif multi_avg_dev < single_overall_dev - 0.3:\n",
    "    print(\"\\n结论: 多轮对话中人格反而更稳定 → 上下文帮助AI保持一致性\")\n",
    "else:\n",
    "    print(\"\\n结论: 单轮与多轮偏移程度相近 → 人格一致性表现稳定\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 实操完成！\n",
    "\n",
    "### 生成的文件\n",
    "- `personality_test_questions.txt` —— 10个人格测试问题\n",
    "- `evaluation_result.txt` —— 10轮单轮评估详细报告\n",
    "- `evaluation_multi_turn.txt` —— 多轮对话综合评估报告\n",
    "\n",
    "### 你学会了\n",
    "1. 搭建多模块的评估工作流（LLM分析 + LLM评分 + 代码计算）\n",
    "2. 两个工作流串联调用（患者 → 评估）\n",
    "3. 从评估报告中自动提取分数并统计偏移\n",
    "4. 对比单轮与多轮场景下的人格一致性差异\n",
    "\n",
    "### 如果发现人格偏移严重，怎么改进？\n",
    "1. 在患者Prompt开头加**身份锚定**：\"无论对话多长，始终保持内向焦虑的人格\"\n",
    "2. 加强**行为约束**：\"每次回答不超过2句话\"、\"不要主动展开话题\"\n",
    "3. 增加**Few-shot示范**：给出更多符合人格的对话样例\n",
    "4. 修改后重新运行本Notebook，对比改进效果"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
