{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Document question answering\n",
        "\n",
        "The Document question answering is an AI-powered assistant that, with the help of the [Semantic search API](./semantic_search), helps you find relevant information for any question and passes the retrieved information into an LLM for interpretation into a natural language answer.\n",
        "\n",
        "This notebook shows you how to implement the Document question answering API, upoad a PDF to Cognite Data Fusion (CDF), and use this API to ask questions about the uploaded document. All PDF documents uploaded to CDF automatically pass through a Retrieval Augmented Generation (RAG) pipeline. The documents are parsed and OCRed, and all the contained information is indexed for the Semantic search API."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "trusted": true
      },
      "outputs": [],
      "source": [
        "from cognite.client import CogniteClient\n",
        "\n",
        "# Instantiate Cognite SDK client:\n",
        "client = CogniteClient()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Step 1. Upload PDF\n",
        "\n",
        "You can upload a PDF file to CDF one of the following ways:\n",
        "\n",
        "* Go to **_CDF_** > **_Industrial tools_** > **_Canvas_** and drag your PDF file to the canvas or upload existing files by selecting **_+ Add data_**.  \n",
        "  If you don't have a good file to upload, try this [test file](./well_report.pdf).\n",
        "\n",
        "* Go to **_CDF_** > **_Industrial tools_** > **_Data explorer_** > **_Files_** and select **_Upload_**.\n",
        "\n",
        "* Use the Python code."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "trusted": true
      },
      "outputs": [],
      "source": [
        "response1 = client.files.upload(path=\"./well_report.pdf\")\n",
        "document_id = response1.id\n",
        "print(document_id)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Step 2. Processing\n",
        "\n",
        "Once you've uploaded the file, wait for it to pass through the RAG pipeline. You can use the Document status API to poll the status."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "trusted": true
      },
      "outputs": [],
      "source": [
        "import time\n",
        "\n",
        "status_path = f\"/api/v1/projects/{client.config.project}/documents/status\"\n",
        "\n",
        "body = {\n",
        "    \"items\": [\n",
        "        {\n",
        "            \"id\": document_id\n",
        "        }\n",
        "    ]\n",
        "}\n",
        "\n",
        "while True:\n",
        "    response2 = client.post(status_path, json=body, headers={\"cdf-version\": \"alpha\"}).json()\n",
        "\n",
        "    status = response2[\"items\"][0][\"semanticsearch\"][\"status\"]\n",
        "    print(f\"status: {status}\")\n",
        "\n",
        "    if status in {\"waiting\", \"progress\"}:\n",
        "        time.sleep(5)\n",
        "        continue\n",
        "\n",
        "    break"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Step 3. Ask questions\n",
        "\n",
        "Once the document is fully indexed, start asking questions with the Python code."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "trusted": true
      },
      "outputs": [],
      "source": [
        "import json\n",
        "\n",
        "ask_path = f\"/api/v1/projects/{client.config.project}/ai/tools/documents/ask\"\n",
        "\n",
        "body = {\n",
        "    \"question\": \"Where is the Volve field located?\",\n",
        "    \"fileIds\": [\n",
        "        {\n",
        "            \"id\": document_id\n",
        "        }\n",
        "    ]\n",
        "}\n",
        "\n",
        "response3 = client.post(ask_path, json=body, headers={\"cdf-version\": \"beta\"}).json()\n",
        "\n",
        "print(json.dumps(response3, indent=2))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "trusted": true
      },
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python (Pyodide)",
      "language": "python",
      "name": "python"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "python",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}