{ "metadata": { "kernelspec": { "display_name": "Python (Pyodide)", "language": "python", "name": "python" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8" } }, "nbformat_minor": 4, "nbformat": 4, "cells": [ { "cell_type": "markdown", "source": "# Document question answering\n\nThe Document question answering is an AI-powered assistant that, with the help of the [Semantic search API](./semantic_search), helps you find relevant information for any question and passes the retrieved information into an LLM for interpretation into a natural language answer.\n\nThis notebook shows you how to implement the Document question answering API, upoad a PDF to Cognite Data Fusion (CDF), and use this API to ask questions about the uploaded document. All PDF documents uploaded to CDF automatically pass through a Retrieval Augmented Generation (RAG) pipeline. The documents are parsed and OCRed, and all the contained information is indexed for the Semantic search API.", "metadata": {} }, { "cell_type": "code", "source": "from cognite.client import CogniteClient\nfrom cognite.client.exceptions import CogniteAPIError\n\n# Instantiate Cognite SDK client:\nclient = CogniteClient()", "metadata": { "trusted": true }, "outputs": [], "execution_count": null }, { "cell_type": "markdown", "source": "## Step 1. Upload PDF\n\nYou can upload a PDF file to CDF one of the following ways:\n\n* Go to **_CDF_** > **_Industrial tools_** > **_Canvas_** and drag your PDF file to the canvas or upload existing files by selecting **_+ Add data_**. \n If you don't have a good file to upload, try this [test file](./well_report.pdf).\n\n* Go to **_CDF_** > **_Industrial tools_** > **_Data explorer_** > **_Files_** and select **_Upload_**.\n\n* Use the Python code.", "metadata": {} }, { "cell_type": "code", "source": "response1 = client.files.upload(path=\"./well_report.pdf\")\ndocument_id = response1.id\nprint(document_id)", "metadata": { "trusted": true }, "outputs": [], "execution_count": null }, { "cell_type": "markdown", "source": "## Step 2. Ask questions\n\nOnce the document is uploaded, we can start asking our questions.\n\nIt may take some time before the document has been fully processed and ready so we wrap the API call in a while loop, so that we can retry until we get our answer.", "metadata": {} }, { "cell_type": "code", "source": "import json\nimport time\n\nask_path = f\"/api/v1/projects/{client.config.project}/ai/tools/documents/ask\"\n\nbody = {\n \"question\": \"Where is the Volve field located?\",\n \"fileIds\": [\n {\n \"id\": document_id\n }\n ]\n}\n\nwhile True:\n try:\n response2 = client.post(ask_path, json=body).json()\n break\n \n except CogniteAPIError as e:\n if e.code == 422 and len(e.missing) > 0:\n print(\"Not ready yet, waiting 5 seconds...\")\n time.sleep(5)\n continue\n\n # re-raise any unexpected exceptions\n raise\n\nprint(json.dumps(response2, indent=2))", "metadata": { "trusted": true }, "outputs": [], "execution_count": null }, { "cell_type": "code", "source": "", "metadata": { "trusted": true }, "outputs": [], "execution_count": null } ] }