Files
In Cognite Data Fusion, the file resource type stores files and documents that are related to one or more assets. For example, a file can contain a piping and instrumentation diagram (P&ID) that shows how several assets connect.
About files
Files are created in two steps where the first step stores the metadata in a file object, and the second step uploads the file contents. This means that files can exist in Cognite Data Fusion without actually being uploaded.
Each file has a unique id
that's generated at file creation. Specify a fileName
when the file is created.
If you want to be in control of the file identifier, you can specify an externalId
which must be unique within a project.
A file can also have metadata
key-value fields that are searchable. You can use these fields to store source system IDs and other information.
Additionally, files can have labels
attached to them, making it easier to organize and categorize files.
You can retrieve the information for a file, both standard and dynamic metadata fields, using the files list or searching REST API calls. You can download the file contents with the file download REST API call.
See the Files API documentation to learn more about working with the files API.
See the Labels API documentation to learn more about managing Labels.
Geographic location of files
Specify a file's geographic location, for example, its geometric features and coordinates, in the geoLocation
field. Data in this field needs to follow the GeoJSON specification, explained in detail in RFC 7946. The coordinate reference system for all GeoJSON coordinates is a geographic coordinate reference system that uses the World Geodetic System 1984 (WGS84).
GeoJSON types
A GeoJSON object has one of 3 types:
- Feature - Geometric objects with (optional) extra features.
- FeatureCollection - A collection of Features.
- GeometryCollection - A collection of Geometry objects (see below).
Currently, Feature is the only type supported by Cognite Data Fusion.
To describe a geographic feature, the Geometry object needs a type
and a corresponding array of coordinates
. Below are the supported Geometry types:
Type | Description | Example | |
---|---|---|---|
Point | Only one exact point. | {"type": "Point", "coordinates": [30, 10]} | |
MultiPoint | Multiple points. | {"type": "MultiPoint", "coordinates": [[10, 40], [40, 30], [20, 20], [30, 10]]} | |
LineString | A line. | {"type": "LineString", "coordinates": [[30, 10], [10, 30], [40, 40]]} | |
MultiLineString | Multiple lines. | {"type": "MultiLineString", "coordinates": [[[10, 10], [20, 20], [10, 40]], [[40, 40], [30, 30], [40, 20], [30, 10]]]} | |
Polygon | A closed shape. Can have inner holes of arbitrary shapes. | {"type": "Polygon", "coordinates": [[[35, 10], [45, 45], [15, 40], [10, 20], [35, 10]], [[20, 30], [35, 35], [30, 20], [20, 30]]]} | |
MultiPolygon | Multiple closed shapes. Can have inner holes of arbitrary shapes. | {"type": "MultiPolygon", "coordinates": [[[[30, 20], [45, 40], [10, 40], [30, 20]]], [[[15, 5], [40, 10], [10, 20], [5, 10], [15, 5]]]]} |
Adding geoLocation to a file
The geoLocation
field requires the following properties:
type
The type of GeoJSON. Cognite Data Fusion only supports the Feature
type.
geometry
Represents the points, curves, and surfaces in coordinate space. The property consists of:
-
type
- Must be one of the following geometry types:Point
,MultiPoint
,LineString
,MultiLineString
,Polygon
, andMultiPolygon
. See GeoJSON types above. -
coordinates
- An array describing the specified geometry type. The type of geometry determines the shape of this array. For instance, aPoint
geometry type will contain acoordinate
array consisting of just a single x and a single y coordinate. See example 1 below. -
A
LineString
geometry type will contain acoordinate
array with two or more points, as shown in example 2. APolygon
geometry type will need to contain an array of closedLineStrings
with four or more points, as shown in example 3. See the GeoJSON spec for more details on the various shapes of thecoordinates
field.
properties
An optional field specifying extra information to enrich the Feature
.
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [10.727414488792418, 6059.91713955864316]
},
"properties": {
"name": "Norwegian Royal Palace"
}
}
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[35, 10],
[45, 45],
[15, 40],
[10, 20],
[35, 10]
]
}
}
Note that in this example, the Polygon
specifies an outer and inner LineString
.
A Polygon
can (but doesn't have to) contain several of these LineStrings
, where the first must be the exterior ring, and the next LineStrings
are interior rings. This is how you would define a surface with holes.
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[35, 10],
[45, 45],
[15, 40],
[10, 20],
[35, 10]
],
[
[20, 30],
[35, 35],
[30, 20],
[20, 30]
]
]
}
}
Upload example
POST /api/v1/projects/publicdata/files
Content-Type: application/json
{
"name": "file1",
"externalId": "file",
"geoLocation": {
"type": "Feature",
"geometry": {
"type": "point",
"coordinates": [
10.727414488792418,
6059.91713955864316
]
},
"properties": {
"name": "Norwegian Royal Palace"
}
}
}
Update example
POST /api/v1/projects/publicdata/files/update
Content-Type: application/json
{
"items": [
{
"id": 123454321,
"update": {
"geoLocation": {
"set": {
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
133.2,
2.5
]
},
"properties": {
"name": "Another place"
}
}
}
}
}
]
}
GeoLocation filtering
Filtering on, or searching for files matching a certain geoLocation
requires two properties:
-
relation
- The geographic relation, eitherINTERSECTS
,WITHIN
, orDISJOINT
. -
shape
- Thegeometry
, as described in the geometry section. Filtering is not available for theMultiPoint
type.
Filter example
POST /api/v1/projects/publicdata/files/list
Content-Type: application/json
{
"filter": {
"geoLocation": {
"relation": "intersects",
"shape": {
"type": "MultiPolygon",
"coordinates": [[
[[35, 10], [45, 45], [15, 40], [10, 20], [35, 10]],
[[20, 30], [35, 35], [30, 20], [20, 30]]
], [
[[36, 11], [46, 46], [16, 41], [11, 21], [36, 11]],
[[21, 31], [36, 36], [31, 21], [21, 31]]
]]
}
}
}
}
Rate and Concurrency Limits
There are limits on the rate of requests (RPS) and the number of parallel requests. A request exceeding the limits will result in a 429 error response: Too Many Requests.
Define limits at both the API service and endpoint levels. Every request has a different budget due to the varying resource consumption. For example, there are two types of requests: CRUD (Create(Upload/Upload multipart/Complete multipart), Retrieve, Request ByIDs, Get icon, Download, Update, and Delete) and Analytical (Aggregate, List, Search, and Filter). CRUD requests are less resource-intensive than Analytical requests. Among all Analytical requests, Aggregates are the most resource-intensive, so they receive their request budget within the overall Analytical request budget.
The limits for the API service and its endpoints are shown in the diagram below. These limits are subject to change based on consumption patterns and resource availability over time. Changes to limits will be notified in the changelog.
Translate RPS to data speed
A single request can retrieve up to 1000 items, where 1 item is a file record. The top API service level has a maximum theoretical data speed of 160,000 items per second for all consumers and 120,000 for a single identity or client in a project.
Use of parallel retrieval
Parallel retrieval is a technique used to improve data retrieval performance in cases where due to query complexity, data retrieval speeds are lower than they would normally be with a fast, simple query. Use parallel retrieval to retrieve large data sets up to the capacity limits defined for an API service.
For example, the Files API request has the following limits:
- A single request can retrieve up to 1000 items.
- Up to 23 requests per second may be issued for an analytical query (per identity), such as when using /list or /filter API endpoints (see above diagram).
Resulting in:
- A theoretical maximum of 23,000 items read per second per identity.
Additionally, complex analytical queries may return data slower than the theoretical maximum. Typically, the more complex the query, the slower the data rate.
Resulting in:
- A single request taking longer than 1s to read or write 1000 items.
Therefore, for complex 'analytical' queries that return data slower than the theoretical maximum, the query should retrieve fewer items per request and more in parallel until the theoretical maximum performance of 23,000 items per second is reached.
Use parallel retrieval only when a single request flow provides data retrieval speeds significantly less than the theoretical maximum. The overall requests per second limit still apply regardless of the number of concurrent requests issued. For example, if a request returns data at 18,000 items per second, adding a second parallel request provides little benefit as only 5,000 more items can be returned before the budget limit is reached.