YAML configuration reference

At the core of the cdf-tk tool are the YAML configuration files. In the end, a deployment to a Cognite Data Fusion (CDF) project in the form of a cdf-tk deploy command is basically the definitions found in the YAML files being sent to the CDF API.

If you are not familiar with how the cdf-tk uses variables, environment variables, or what packages and modules are, you should familiarize yourself with those concepts first.

If you want to know more about a specific module or package, you can find the information about packages and modules here.

This page describes each of the Cognite Data Fusion resource types that are supported by the cdf-tk tool. In each module, you will find a number of directories. Each directory corresponds to a resource type in CDF that cdf-tk supports.

./<module_name>/
                                |- auth/
                                |- data_models/
                                |- data_sets/
                                |- extraction_pipelines/
                                |- files/
                                |- functions/
                                |- raw/
                                |- transformations/
                                |- timeseries/
                                |- timeseries_datapoints/

The format of the YAML files is generally one to one with the API specification for the resource type. The public API documentation for CDF can be found here.

The YAML file names are in general recommended to be named equal or similar to the externalId of the resoures. The filenames are mostly not significant (with a few exceptions) and are for human readability only. You can also use number prefixes of type 1.<filename.suffix>, 2.<filename2.suffix> to control the order of deployment within each resource type. Beyond indicating the relative order of how to deploy the resource type to CDF, the number prefixes are not significant.

Groups (dir: auth/)

API documentation: Groups

Groups can be found in the module's auth/ directory. Each group has its own YAML file. The name of the file should be named after the group it creates, or resembling. The name field is treated as a unique identifier for the group. Thus, if you change the name of the group manually in Cognite Data Fusion, it will be treated as a different group and not touched by the cdf-tk tool.

The below is an example of a simple group definition.

my_group.yaml
name: 'my_group'
sourceId: '{{mygroup_source_id}}'
metadata:
  origin: 'cdf-project-templates'
capabilities:
  - projectsAcl:
      actions:
        - LIST
        - READ
      scope:
        all: {}

It is recommended to use the metadata:origin property to indicate that the group was created by the cdf-tk tool. This makes it easier to identify groups that were created by the tool in the user interface. Note that the sourceId always must be populated with the group id or id used by the CDF project's identity provider to identify group membership. It can be a GUID, a number, or a string.

Each Acl capability in CDF can be specified as the projectsAcl above. Scoping to dataset, space, RAW table, current user, or pipeline is also supported (see below).

Groups and group deletion

When deleting groups with the cdf-tk clean command, the tool will skip groups that the running user/service principal is a member of. This is to prevent that a cleaning operation removes access rights from the running user, potentially locking it out from further operation.

Acl scoping

Dataset-scope:

Used to restrict access to only data in a specific data set.

<fragment>
  - threedAcl:
      actions:
        - READ
      scope:
        datasetScope: {
          ids: ['my_dataset']
        }

Id mismatch between CDF API and the YAML file

The groups API operates on an internally assigned id for the data set (as the data set external id can be changed). However, to ensure configuration-as-code, the YAML files do not reference the internal id, but the external id. The cdf-tk tool will resolve the external id to the internal id before sending the request to the CDF API.

Space-scope:

Used to restrict access to only data in a specific data model space.

<fragment>
  - dataModelInstancesAcl:
      actions:
        - READ
      scope:
        spaceIdScope: {
            spaceIds: [
              'my_space'
            ]
        }

Table-scope:

Used to restrict access to only one specific database or even table in a database.

<fragment>
  - rawAcl:
      actions:
      - READ
      - WRITE
      scope:
        tableScope:
          dbsToTables:
            my_database:
              tables: []

Current user-scope:

Used to restrict the actions only to the groups the user is a member of.

<fragment>
  - groupsAcl:
      actions:
      - LIST
      - READ
      scope:
        currentuserscope: {}

Data models (dir: data_models/)

API documentation: Data Modeling

The data models can be found in the module's data_models/ directory. A data model consists of a set of data modeling entities: one or more spaces, containers, views, and data models. Each entity has its own file with a suffix that shows the entity type: something.space.yaml, something.container.yaml, something.view.yaml, something.datamodel.yaml. You can have multiple instances of each entity a file, but we recommend you keep one instance per file. In addition to data model entities, cdf-tk also supports creation of nodes. Nodes are defined in files with the suffix .node.yaml. Nodes are sometimes used to keep configuration for applications (such as Infield). In addition, nodes can also be used for creating node types which are part of the data model.

When creating, updating, and deleting data models, the cdf-tk tool will apply changes in the right order based on the dependencies between the entity types. This means that spaces will be created first as everything else lives in spaces. Then, containers will be created as the keeper of properties and data. After containers, the views will be created as they reference one or more containers, and then finally, the data models will be created as they reference one or more views.

There may also be dependencies between the entities of the same type. For example, a view may reference another view. In this case, the view that is referenced must be created before the second view can be created. You can manage this by using the prefix numbers in the filenames. The cdf-tk tool will apply the files ordered on the prefix numbers. The numbering is only significant within each entity type, so a view and a container with the same number prefix will be ordered separately within views and containers respectively.

Spaces

API documentation: Spaces

Spaces are the top-level entity in a data model. A space is the home of containers, views, and data models. A space can be created with a .space.yaml file in the data_models/ directory.

sp_cognite_app_data.space.yaml
space: sp_cognite_app_data
name: cognite:app:data
description: Space for Infield App Data

caution

Please note that cdf-tk clean will ONLY delete data in spaces that have been explicitly defined by a <space_name>.space.yaml file in data_models/ of the module. This is to avoid that externally defined spaces referenced in the view and data model configurations are deleted. If you want a space to be cleaned up, add an explicit space configuration file. Note that CDF will not allow a space to be deleted unless it is completely empty. If a governed space, contains, for example, nodes which are not governed by the toolkit, the toolkit will not be able to delete the space.

Containers

API documentation: Containers

Containers are the second-level entity in a data model. A container is the home of properties and data. A container can be created with a .container.yaml file in the data_models/ directory. The YAML supports creation of indexes and constraints according to the API specification.

MyActivity.container.yaml
externalId: MyActivity
usedFor: node
space: sp_activity_data
properties:
  id:
    type:
      type: text
      list: false
      collation: ucs_basic
    nullable: true
  title:
    type:
      type: text
      list: false
      collation: ucs_basic
    nullable: true
  description:
    type:
      type: text
      list: false
      collation: ucs_basic
    nullable: true

The above container definition will create a container with three properties: id, title, and description.

Note that sp_activity_data requires its own activity_data.space.yaml file in the data_models/ directory.

Views

API documentation: Views

Views are the third-level entity in a data model. Views are used to ingest, query, and structure the data into meaningful entities in your data model. A view can be created with a .view.yaml file in the data_models/ directory.

MyActivity.view.yaml
externalId: MyActivity
name: MyActivity
description: "An activity represents a set of maintenance tasks, comprised of multiple operations for individual assets. It provides an overarching description and is considered incomplete until all its operations are finished."
version: '3'
space: sp_activity_model
properties:
  id:
    description: "Unique identifier from the source, such as object ID in SAP."
    container:
      type: container
      space: sp_activity_data
      externalId: MyActivity
    containerPropertyIdentifier: id
  title:
    description: "Concise title or brief description of the maintenance activity or work order."
    container:
      type: container
      space: sp_activity_data
      externalId: MyActivity
    containerPropertyIdentifier: title
  description:
    description: "Detailed explanation of the maintenance activity or work order."
    container:
      type: container
      space: sp_activity_data
      externalId: MyActivity
    containerPropertyIdentifier: description

The above view definition will create a view with three properties: id, title, and description. The view references the properties from the container MyActivity in the space sp_activity_data as defined above. The view lives in a space called sp_activity_model while the container lives in the space sp_activity_data.

Data models

API documentation: Data models

Data models are the fourth-level and the highest level entity in a data model. Data models are used to structure the data into meaningful knowledge graphs with relationships between views using edges. From an implementation perspective, a data model is simply a collection of viwes. A data model can be created with a .datamodel.yaml file in the data_models/ directory.

ActivityDataModel.datamodel.yaml
externalId: ActivityDataModel
name: My activity data model
version: '1'
space: sp_activity_model
description: 'A data model for structuring and querying activity data.'
views:
  - type: view
    externalId: MyActivity
    space: sp_activity_model
    version: '3'
  - type: view
    externalId: MyTask
    space: sp_activity_model
    version: '2'

The above data model definition will create a data model with two views: MyActivity and MyTasks. The data model lives in a space called sp_activity_model together with the views. The view MyActivity was defined above in the section on views, while MyTask is defined in a separate file (not shown here in the example).

Nodes

API documentation: Instances

Nodes are used to populate a data model. You can create nodes. Nodes can be found in the data_models/ directory with the suffix .node.yaml.

myapp_config.node.yaml
autoCreateDirectRelations: True
skipOnVersionConflict: False
replace: True
nodes:
  - space: sp_config
    externalId: myapp_config
    sources:
      - source:
          space: sp_config
          externalId: MY_APP_Config
          version: '1'
          type: view
        properties:
          rootLocationConfigurations:
            - assetExternalId: 'my_root_asset_external_id'
              adminGroup:
              - gp_template_admins
          dataSpaceId: sp_activity_data
          modelSpaceId: sp_activity_model
          activityDataModelId: MyActivity
          activityDataModelVersion: '1'

The above node configuration creates a node instance with data that configures a node of the type MY_APP_Config with version '1' that can be found in the space sp_config. The instance has some data that is read by MY_APP and used to configure the application. The node instance is created in the space sp_config with the externalId myapp_config. Here, we configure a root location used in our application, as well as how to find the application's data: in the sp_activity_data space (dataSpaceId) stores as our previously defined view MyActivity of version 1 found in the space sp_activity_data.

Data sets (dir: data_sets/)

API documentation: Data sets

Data sets cannot be deleted, only archived. Thus, the cdf-tk tool will only create data sets if they do not exist. It can be updated, including the externalId. The cdf-tk tool will then treat the data set as a new one. You can create multiple data sets in the same yaml file.

Referencing data sets from other yaml configuration files

As the external id of a data set can change, CDF operates with references to an internally created and immutable internal id for data sets. Wherever the API references a data set internal id, the YAML file uses the external id. When you are explicitly referencing a data set from another YAML file, you should use the external id and instead of using the API's dataSetId property, use the special property dataSetExternalId. This will ensure that the cdf-tk tool will resolve the external id. See for example time series, files, and extractor pipelines.

data_sets.yaml
- externalId: ds_asset_hamburg
  name: asset:hamburg
  description: This dataset contains asset data for the Hamburg location.
- externalId: ds_files_hamburg
  name: files:hamburg
  description: This dataset contains files for the Hamburg location.

The above configuration will create two data sets using the naming conventions for data sets.

Extraction pipelines (dir: extraction_pipelines/)

API documentation: Extraction pipelines

API documentation: Extraction pipeline config

Documentation: Extraction pipeline documentation

Extraction pipelines and their associated configurations can be found in the module's extraction_pipelines/ directory. Each extraction pipeline has its own YAML file. The configuration for the pipeline should be in a separate file with the same name as the extraction pipeline with .config.yaml as the suffix. The content of the config property in .config.yaml file is expected to be valid YAML.

ep_src_asset_hamburg_sap.yaml
externalId: 'ep_src_asset_hamburg_sap'
name: 'src:asset:hamburg:sap'
dataSetExternalId: 'ds_asset_{{location_name}}'
description: 'Asset source extraction pipeline with configuration for DB extractor reading data from Hamburg SAP'
rawTables:
  - dbName: 'asset_hamburg_sap'
    tableName: 'assets'
source: 'sap'
documentation: "The DB Extractor is a general database extractor that connects to a database, executes one or several queries and sends the result to CDF RAW.\n\nThe extractor connects to a database over ODBC, which means that you need an ODBC driver for your database. If you are running the Docker version of the extractor, ODBC drivers for MySQL, MS SQL, PostgreSql and Oracle DB are preinstalled in the image. See the example config for details on connection strings for these. If you are running the Windows exe version of the extractor, you must provide an ODBC driver yourself. These are typically provided by the database vendor.\n\nFurther documentation is available [here](./docs/documentation.md)\n\nFor information on development, consider the following guides:\n\n * [Development guide](guides/development.md)\n * [Release guide](guides/release.md)"

The above configuration will create an extraction pipeline with the external id ep_src_asset_hamburg_sap and the name src:asset:hamburg:sap. The main feature of the extraction pipeline is the configuration that allows the extractor installed inside a closed network to connect to CDF and download its configuration file. The configuration file is expected to be in the same directory as the extraction pipeline configuration file and with the same name as the extraction pipeline configuration file, but with the suffix .config.yaml. The configuration file is not strictly required, as there might be use cases for extraction pipelines that do not include remote configuration. However, the tool will issue a warning if the config file is missing during the deploy process.

Also, the extraction pipeline can be connected to a data set as well as the RAW tables that the extractor will write to.

The below is an example of a configuration file for the extraction pipeline above.

ep_src_asset_hamburg_sap.config.yaml
externalId: 'ep_src_asset_hamburg_sap'
description: 'DB extractor config reading data from Hamburg SAP'
config:
  logger:
    console:
      level: INFO
    file:
      level: INFO
      path: "file.log"
  # List of databases
  databases:
    - type: odbc
      name: postgres
      connection-string: "DSN={MyPostgresDsn}"
  # List of queries
  queries:
    - name: test-postgres
      database: postgres
      query: >
        SELECT 

Notice that the config property is expected to be valid YAML. The cdf-tk tool will not validate the content of the config property beyond the syntax validation. The extractor that is configured to download the configuration file validates the content of the config property.

Files (dir: files/)

API documentation: Files

Files are primarily supported for example data, and there is no advanced functionality for ingesting files into CDF. Files can be found in the module's files/ directory. You only need a single files.yaml (filename is not important) to specify the metadata for each file you want to upload. There is also a special template format you can use to upload multiple files.

files.yaml
- externalId: 'sharepointABC_my_file.pdf'
  name: 'my_file.pdf'
  source: 'sharepointABC'
  dataSetExternalId: 'ds_files_hamburg'
  directory: 'files'
  mimeType: 'application/pdf'
  metadata:
    origin: 'cdf-project-templates'

Note here how the dataSetExternalId is used to reference the data set. The cdf-tk tool will resolve the external id to the internal id.

If you want to upload multiple files without specifying each file individually, you can use the following template format:

files.yaml
- externalId: sharepointABC_$FILENAME
  dataSetExternalId: ds_files_hamburg
  source: sharepointABC

The above template only expects one entry in the file. All the files will be uploaded with the same properties except for the externalId and name properties. The $FILENAME variable will be replaced with the filename of the file being uploaded. The name property will always be set to the filename of the file being uploaded.

Functions (dir: functions/)

API documentation: Functions

Functions can be found in the module's functions/ directory. You can define one or more functions in a single yaml file. The cdf-tk tool will create the functions in the order they are defined in the file. The function code and files to deploy to CDF as a function should be found in a sub-directory with the same name as the externalId of the function.

The below is an example of a function definition.

my_functions.yaml
# The dir with the function code should have the same name
# and externalId as the function itself as defined below.
- name: 'example:repeater'
  externalId: 'fn_example_repeater'
  owner: 'Anonymous'
  description: 'Returns the input data, secrets, and function info.'
  metadata:
    version: '{{version}}'
  secrets:
    mysecret: '{{example_secret}}'
  envVars:
    # The two environment variables below are set by the Toolkit
    ENV_TYPE: '${CDF_BUILD_TYPE}'
    CDF_ENV: '${CDF_ENVIRON}'
  runtime: 'py311'
  functionPath: './src/handler.py'
  # Data set id for the zip file with the code that is uploaded.
  externalDataSetId: 'ds_files_{{default_location}}'

The functionPath is the path to the handler.py in the function code directory. In this case, handler.py is expected to be found in the fn_example_repeater/src/ directory.

Note the special parameters in the function definition:

externalDataSetId: This field is used to reference the data set that the function itself is assigned to. The cdf-tk tool will resolve the external id to the internal id.

Function schedules (dir: functions/)

API documentation: Schedules

Schedules for functions can also be found in the module's functions/ directory. The yaml file is expected to have schedule in the filename, e.g. schedules.yaml. You can specify more than one schedule in a single file. Schedules will always be deployed after functions to ensure that the function exists before the schedule is created. As the schedules do not have externalIds, the cdf-tk tool identifies the schedule by the combination of the functionExternalId and the cronExpression. This means that you cannot deploy two schedules for a function with the exact same schedule, but with two different sets of data. Adjust the cronExpression slightly to work around this.

schedules.yaml
- name: "daily-8am-utc"
  functionExternalId: 'fn_example_repeater'
  description: "Run every day at 8am UTC"
  cronExpression: "0 8 * * *"
  data:
    breakfast: "today: peanut butter sandwich and coffee"
    lunch: "today: greek salad and water"
    dinner: "today: steak and red wine"
  authentication:
    # Credentials to use to run the function in this schedule.
    # In this example, we just use the main deploy credentials, so the result is the same, but use a different set of
    # credentials (env variables) if you want to run the function with different permissions.
    clientId: {{myfunction_clientId}}
    clientSecret: {{myfunction_clientSecret}}
- name: "daily-8pm-utc"
  functionExternalId: 'fn_example_repeater'
  description: "Run every day at 8pm UTC"
  cronExpression: "0 20 * * *"
  data:
    breakfast: "tomorrow: peanut butter sandwich and coffee"
    lunch: "tomorrow: greek salad and water"
    dinner: "tomorrow: steak and red wine"

The functionExternalId must match an existing function or a function deployed by the tool. The cronExpression is a standard cron expression. As for transformations, the authentication property is optional and can be used to specify different credentials for the schedule than the default credentials used by the tool. It is recommended to use credentials with the minimum required access rights.

Running functions locally

To speed up the development process, cdf-tk supports running a function locally. This is useful for testing the function code before deploying it to CDF. The cdf-tk tool will use the same environment variables as the deployed function, so you can test the function with the same data and environment variables as it will have in CDF.

A function can be run locally with the following command:

cdf-tk run function --local --payload=\{\"var1\":\ \"testdata\"\} --external_id fn_example_repeater --env dev my_project/

Run cdf-tk run function --help for more information about the options supported.

caution

The function will run in a virtual python environment using the version of python you use to run the cdf-tk tool. Running a function locally will automatically do a local build and thus resolve any config.<env>.yaml and environment variables.

The requirements.txt file in the function code directory will be used to install the required packages in the function's execution environment.

The environment variables configured for the function will be injected into the virtual environment (and no others beyond what is needed to run the virtualenv python), but the secrets are currently not supported. This is to avoid a potential security issue transferring the secrets into the function.

The payload is expected to be a string on the command line. It should be possible to interpret as JSON dictionary because the string will be converted to a dictionary that will be passed into the function as data. The input and output of the function will be written to files in the temporary build directory.

RAW (dir: raw/)

API documentation: RAW

RAW configurations can be found in the module's raw/ directory. You need a yaml file per table you want to load. Each table should have a .csv file with the data with the same filename as the YAML file. You can skip the csv file if you only want to create the database and table.

asset_hamburg_sap.yaml
dbName: asset_hamburg_sap
tableName: assets

The above configuration will create a RAW database called asset_hamburg_sap with a table called assets.

asset_hamburg_sap.csv
"key","categoryId","sourceDb","parentExternalId","updatedDate","createdDate","externalId","isCriticalLine","description","tag","areaId","isActive"
"WMT:48-PAHH-96960","1152","workmate","WMT:48-PT-96960","2015-10-06 12:28:33","2013-05-16 11:50:16","WMT:48-PAHH-96960","false","VRD - PH STG1 COMP WTR MIST RELEASED : PRESSURE ALARM HIGH HIGH","48-PAHH-96960","1004","true"
"WMT:48-XV-96960-02","1113","workmate","WMT:48-XV-96960","2015-10-08 08:48:04","2009-06-26 15:36:40","WMT:48-XV-96960-02","false","VRD - PH STG1 COMP WTR MIST WTR RLS","48-XV-96960-02","1004","true"
"WMT:23-TAL-96183","1152","workmate","WMT:23-TT-96183","2015-10-06 12:28:32","2013-05-16 11:50:16","WMT:23-TAL-96183","false","VRD - PH 1STSTG COMP OIL TANK HEATER : TEMPERATURE ALARM LOW","23-TAL-96183","1004","true"

Transformations (dir: transformations/)

API documentation: Transformations

Transformations can be found in the module's transformations/ directory. Each transformation has its own YAML file. Each transformation can have a .sql file with the SQL code (to avoid embedding in the yaml). This .sql file should have the same filename as the yaml file that defines the transformation (without the number prefix) or named with the externalId of the transformation.

The transformation schedule is a separate resource type, tied to the transformation by external_id.

Example:

tr_asset_oid_workmate_asset_hierarchy.yaml
externalId: 'tr_asset_{{location_name}}_{{source_name}}_asset_hierarchy'
dataSetExternalId: 'ds_asset_{{location_name}}'
name: 'asset:{{location_name}}:{{source_name}}:asset_hierarchy'
destination:
  type: "asset_hierarchy"
ignoreNullFields: true
isPublic: true
conflictMode: upsert
# Specify credentials separately like this:
# You can also use different credentials for the running transformations than the ones you use to deploy
authentication:
  clientId: {{cicd_clientId}}
  clientSecret: {{cicd_clientSecret}}
  tokenUri: {{cicd_tokenUri}}
  # Optional: If idP requires providing the cicd_scopes
  cdfProjectName: {{cdfProjectName}}
  scopes: {{cicd_scopes}}
  # Optional: If idP requires providing the cicd_audience
  audience: {{cicd_audience}}

tr_asset_oid_workmate_asset_hierarchy.schedule.yaml
externalId: 'tr_asset_{{location_name}}_{{source_name}}_asset_hierarchy'
interval: '{{scheduleHourly}}'
isPaused: {{pause_transformations}}

tr_asset_oid_workmate_asset_hierarchy.sql
SELECT 
  externalId                      as externalId,
  if(parentExternalId is null, 
     '', 
     parentExternalId)            as parentExternalId,
  tag                             as name,
  sourceDb                        as source,
  description,
  dataset_id('{{asset_dataset}}')     as dataSetId,
  to_metadata_except(
    array("sourceDb", "parentExternalId", "description"), *) 
                                  as metadata
FROM 
  `{{asset_raw_input_db}}`.`{{asset_raw_input_table}}`

Note:

The transformation can be configured with both a from and a to set of credentials (sourceOidcCredentials and destinationOidcCredentials). Using authentication: is a shortcut to configure both credentials to the same set of credentials. If you want to configure different credentials for the source and destination, you should use the sourceOidcCredentials and destinationOidcCredentials properties instead.
schedule is optional. If you do not specify a schedule, the transformation will be created, but not scheduled. You can then schedule it manually in the CDF UI or using the CDF API. Schedule is a separate API endpoint in CDF
You can specify the SQL inline in the transformation YAML file, using the query property (str), but it is recommended to use a separate .sql file for readability.

caution

In the above transformation, the transformation is re-using the globally defined credentials for the cdf-tk tool, i.e. the transformation will run as the full admin user. This is not recommended for production use. You should create a service account with the minimum required access rights.

Configure two new variables in the config.yaml of the module:

abc_clientId: ${ABC_CLIENT_ID}
abc_clientSecret: ${ABC_CLIENT_SECRET}

In the environment (CI/CD pipeline), you need to set the ABC_CLIENT_ID and ABC_CLIENT_SECRET environment variables
to the credentials of the application/service principal configured in your identity provider for the transformation.

Timeseries (dir: timeseries/)

API documentation: Timeseries

The timeseries support in cdf-tk is primarily for example data. There is no advanced functionality for ingesting timeseries into CDF. The timeseries can be found in the module's timeseries/ directory. You only need a single timeseries.yaml (filename is not important) to specify the timeseries.

timeseries.yaml
- externalId: 'pi_160696'
  name: 'VAL_23-PT-92504:X.Value'
  dataSetExternalId: ds_timeseries_hamburg
  isString: false
  metadata:
    compdev: '0'
    location5: '2'
    pointtype: Float32
    convers: '1'
    descriptor: PH 1stStgSuctCool Gas Out
    contextMatchString: 23-PT-92504
    contextClass: VAL
    digitalset: ''
    zero: '0'
    filtercode: '0'
    compdevpercent: '0'
    compressing: '0'
    tag: 'VAL_23-PT-92504:X.Value'
  isStep: false
  description: PH 1stStgSuctCool Gas Out
- externalId: 'pi_160702'
  name: 'VAL_23-PT-92536:X.Value'
  dataSetExternalId: ds_timeseries_hamburg
  isString: false
  metadata:
    compdev: '0'
    location5: '2'
    pointtype: Float32
    convers: '1'
    descriptor: PH 1stStgComp Discharge
    contextMatchString: 23-PT-92536
    contextClass: VAL
    digitalset: ''
    zero: '0'
    filtercode: '0'
    compdevpercent: '0'
    compressing: '0'
    tag: 'VAL_23-PT-92536:X.Value'

The above configuration creates two timeseries in the data set ds_timeseries_hamburg with the external ids pi_160696 and pi_160702.

caution

Timeseries should normally be created as part of ingestion into CDF. I.e. you should configure the data pipelines with corresponding data sets, databases, groups, and so on using the templates and cdf-tk, and then creation of timeseries should be part of the ingestion process.

Timeseries datapoints (dir: timeseries_datapoints/)

API documentation: Timeseries

The timeseries support in cdf-tk is primarily for example data. There is no advanced functionality for ingesting timeseries into CDF. The timeseries datapoints can be found in the module's timeseries_datapoints/ directory. The timeseries must have been created separately using the timeseries directory, see timeseries.

datapoints.csv
timestamp,pi_160696,pi_160702
2013-01-01 00:00:00,0.9430412044195982,0.9212588490581821
2013-01-01 01:00:00,0.9411303320132799,0.9212528389403117
2013-01-01 02:00:00,0.9394743147709556,0.9212779911470234
2013-01-01 03:00:00,0.9375842300608798,
2013-01-01 04:00:00,0.9355836846172971,0.9153202184209938

The above csv file loads data into the timeseries created in the previous example. The first column is the timestamp, and the following columns are the values for the timeseries at that timestamp.

tip

If you specify the column name of the timestamp as timeshift_timestamp instead of timestamp in the above example file, the cdf-tk build command will automatically timeshift the entire timeseries to end with today's data. This is useful, for example, data where you want to have a timeseries that is always up to date.

YAML configuration reference

Groups (dir: auth/)​

Groups and group deletion​

Acl scoping​

Data models (dir: data_models/)​

Spaces​

Containers​

Views​

Data models​

Nodes​

Data sets (dir: data_sets/)​

Extraction pipelines (dir: extraction_pipelines/)​

Files (dir: files/)​

Functions (dir: functions/)​

Function schedules (dir: functions/)​

Running functions locally​

RAW (dir: raw/)​

Transformations (dir: transformations/)​

Timeseries (dir: timeseries/)​

Timeseries datapoints (dir: timeseries_datapoints/)​

Groups (dir: auth/)

Groups and group deletion

Acl scoping

Data models (dir: data_models/)

Spaces

Containers

Views

Data models

Nodes

Data sets (dir: data_sets/)

Extraction pipelines (dir: extraction_pipelines/)

Files (dir: files/)

Functions (dir: functions/)

Function schedules (dir: functions/)

Running functions locally

RAW (dir: raw/)

Transformations (dir: transformations/)

Timeseries (dir: timeseries/)

Timeseries datapoints (dir: timeseries_datapoints/)