# Document question answering

The Document question answering is an AI-powered assistant that, with the help of the [Semantic search API](./semantic_search), helps you find relevant information for any question and passes the retrieved information into an LLM for interpretation into a natural language answer.

This notebook shows you how to implement the Document question answering API, upoad a PDF to Cognite Data Fusion (CDF), and use this API to ask questions about the uploaded document. All PDF documents uploaded to CDF automatically pass through a Retrieval Augmented Generation (RAG) pipeline. The documents are parsed and OCRed, and all the contained information is indexed for the Semantic search API.

In [None]:
from cognite.client import CogniteClient
from cognite.client.exceptions import CogniteAPIError

# Instantiate Cognite SDK client:
client = CogniteClient()

## Step 1. Upload PDF

You can upload a PDF file to CDF one of the following ways:

* Go to **_CDF_** > **_Industrial tools_** > **_Canvas_** and drag your PDF file to the canvas or upload existing files by selecting **_+ Add data_**. 
 If you don't have a good file to upload, try this [test file](./well_report.pdf).

* Go to **_CDF_** > **_Industrial tools_** > **_Data explorer_** > **_Files_** and select **_Upload_**.

* Use the Python code.

In [None]:
response1 = client.files.upload(path="./well_report.pdf")
document_id = response1.id
print(document_id)

## Step 2. Ask questions

Once the document is uploaded, we can start asking our questions.

It may take some time before the document has been fully processed and ready so we wrap the API call in a while loop, so that we can retry until we get our answer.

In [None]:
import json
import time

ask_path = f"/api/v1/projects/{client.config.project}/ai/tools/documents/ask"

body = {
 "question": "Where is the Volve field located?",
 "fileIds": [
 {
 "id": document_id
 }
 ]
}

while True:
 try:
 response2 = client.post(ask_path, json=body).json()
 break
 
 except CogniteAPIError as e:
 if e.code == 422 and len(e.missing) > 0:
 print("Not ready yet, waiting 5 seconds...")
 time.sleep(5)
 continue

 # re-raise any unexpected exceptions
 raise

print(json.dumps(response2, indent=2))