Skip to main content

Using Templates

This section covers the steps 3-5. For steps 1-2, see Getting started.

StepCommandDescription
1.cdf-tk init <proj_dir>Create a new configuration folder, cd <proj_dir>, and initialise the project.
2.cdf-tk auth verify --interactiveCheck that you have access to the project and create a .env. This step can be skipped if you have configured environment variables. Alternatively do cdf-tk auth verify just to verify that everything works.
3.Edit config.<env>.yaml in <proj_dir>Specify the modules you want to deploy for each environment you deploy to. config.<env>.yaml also contains all variables the modules expect. Change the variables for the modules that are relevant for your deployments.
4.cdf-tk --verbose build --env=devBuild configurations to the build/ directory using the config.dev.yaml configuration file.
5.cdf-tk deploy --dry-run --env=devTest deploy the configurations to your CDF project from the build/ directory. Then remove --dry-run to actually push configurations to the project.

Introduction

When you are starting out, your Cognite Data Fusion (CDF) project is empty. You need to set up the project with a core structure that fits with how you want to work with CDF. Depending on your industry and use case, you may want tailored data models, and the systems where your existing data resides will have different naming schemes and structures, so you need to configure CDF to extract, transform, and contextualise your data.

Although CDF is a flexible platform that can be configured to fit most use cases, there are some common patterns and use cases that are useful to have as a starting point. The CDF Toolkit comes with a set of templates that you can use to get started with your project.

What is a template?

A template is a set of configurations that can be deployed to a CDF project through Cognite's open APIs. You don't need to know the APIs to use the templates, but if you create your own configurations, it is useful to know that they mirror the APIs. Technically, the templates is a set of yaml-formatted files that follow the CDF API specifications and thus allows you to textually describe how the CDF project should be set up.

When we refer to templates, we mean the configuration sets that are quality assured and come bundled with the cdf-tk tool. To start using the templates, you do cdf-tk init <folder> to create a new local project folder with the templates pre-installed. You then edit configuration variables that tailor the templates to your project, and you can add new configurations for your project. You can also modify and adapt the templates to fit your project.

Modules and packages

The simplest possible template is a single yaml text file that configures a small, simple thing in a CDF project, like a group or a data set. Then, a group of such yaml files can be put together into a module. A module is a bundle of CDF configurations that logically belong together, are deployed together, and that gives you a certain functionality in your CDF project. For example, the Infield application needs a set of configurations that are shared with other Asset Performance Management (APM) use cases and applications, including the APM data model. These configurations are bundled and found in the cdf_apm_base module.

A package is just a list of modules that are deployed together in a specific order. For example, the cdf_infield package will give you all the modules necessary for Infield to work.

The pre-installed templates are found as modules in the modules/, examples/, common/, and experimental/ directories below the cognite_modules directory in your project direcotry. You are free to edit the configurations in these modules (or copy them to custom_modules/), but if you do not edit them, you get the benefit of being able to do cdf-tk init --upgrade to get the latest version of the templates installed into your project.

Practical steps

The basic flow is as follows: First you build the templates to resolve variables (as defined in config.<env>.yaml) and gather the modules that should be deployed. Then you deploy what was built to the CDF project environment of choice:

Configuring what to deploy

This step describes how to onfigure what to deploy to each of your project environments. These are configured in the config.<env>.yaml files found in the root of your project directory. <env> is the name of the environment you want to manage. Default, two environments are created: dev and prod. You can create any number of environments you want by copying a config.<env>.yaml file and changing the <env> in the file name and editing the environment.name property.

If you want to configure the dev environment, you edit the config.dev.yaml file. Open up config.dev.yaml in the root of the project directory you created with cdf-tk init <folder>. This file is the starting point for how your project is configured. It defines a set of environments, and for each environment, it defines which modules to deploy.

Here is a snippet of the config.dev.yaml file that defines the environment:

environment:
name: dev
project: <customer-dev>
type: dev
selected_modules_and_packages:
- cdf_demo_infield
- cdf_oid_example_data
common_function_code: ./common_function_code

Edit the project property to match the name of your project. This is used as a safety measure to ensure that you don't accidentally deploy to the wrong project. The type property is used to distinguish between different types of environments, and is not currently used by the cdf-tk tool (but will be used to support migrations in the future). The selected_modules_and_packages property is a list of modules to deploy. The module can be found in any of the module directories below the cognite_modules and custom_modules directories. Finally, the common_function_code property is the path to a directory where you can put common code that is used by your functions. The default code found in common_function_code is used to support local execution of functions.

Testing Functions locally

Development of functions

The toolkit repository might not be the ideal environment for active code development, due to how modules are "packaged" in the directory hierarchy (it is easy to get lost). A suggested way of working is to think of the toolkit repository (commit history) as snapshots of a fully working state. Thus we recommend developing and testing function(s) separately, then copying in the verified files.

With that disclaimer out of the way, here's a guide to running locally:

To run, for example, fn_context_files_oid_fileshare_annotation, simply call the file handler.py normally from the root folder of the toolkit - or any subsequent folder, as long as you don't enter into the "package" itself, i.e. fn_context_files_oid_fileshare_annotation:

cognite_toolkit/
cognite_modules/
examples/
cdf_data_pipeline_files_valhall/
functions/

Assuming you have navigated to functions, the full command would be (you may skip poetry run if you already have activated your virtual environment):

poetry run python fn_context_files_oid_fileshare_annotation/handler.py

This works because a special run_locally method has been added (and imports made to work). A list of required environment variables, mostly for authentication towards CDF will be raised if not set correctly (you may inspect said file directly).

Changing the default variables

Each module has variables you may want to change to adapt to your project. It may be the name of your default location (i.e. plant/asset/site), or other things. The configuration variables can be found in the same config.<env>.yaml file as the environment configuration, but further below. They are found in the modules section.

tip

You are free to delete modules that you don't need, both from the cognite_modules directory and in the config.<env>.yaml file.

Most of the variables are set to default values that are useful for the example data set that comes with the templates. You can deploy the configurations as they are without changing these variables, but you probably want to adapt them to your project. Other variables are set to <change_me>. If these are not changed, some functionality will not work.

Module-specific configurations

Here are the default modules and how to edit the variables. If you don't use these modules, the below serves as examples for how to configure the modules you use.

common: cdf_auth_readwrite_all

readwrite_source_id: <change_me>
readonly_source_id: <change_me>

These are the group ids from your identity provider. The readwrite_source_id should be the group id of the group you created for the cdf-tk tool. The readonly_source_id should be the group id of a group that administrators belong to, so they can use the Fusion UI and API to read data.

core: cdf_apm_base

apm_datamodel_space: 'APM_SourceData'
apm_datamodel_version: '1'

Should not be changed. These will be changed with new versions of the APM data model.

examples: cdf_oid_example_data

default_location: oid
source_asset: workmate
source_workorder: workmate
source_files: fileshare
source_timeseries: pi

Each of the source_* variables should just be the name of the system where the data originates from. The defaults are here for the example data set that comes with the templates ("Open Industrial Data" or OID).

The default_location is the default location (plant/asset/site) that is used in the example data set. Here we just use oid, but this should be something short and meaningful to your project, like houston or plantY.

infield: cdf_infield_common

applicationsconfiguration_source_id: <change_me>

Users that are members of this group in your identity provider will be able to configure Infield.

infield: cdf_infield_location

default_location: oid
module_version: '1
apm_datamodel_space: APM_SourceData
apm_app_config_external_id: default-infield-config-minimal
apm_config_instance_space: APM_Config
source_asset: workmate
source_workorder: workmate
workorder_raw_db: workorder_oid_workmate
workorder_table_name: workorders
root_asset_external_id: WMT:VAL
infield_default_location_checklist_admin_users_source_id: <change_me>
infield_default_location_normal_users_source_id: <change_me>
infield_default_location_template_admin_users_source_id: <change_me>
infield_default_location_viewer_users_source_id: <change_me>

The Infield location module references default the cdf_oid_example_data module. This means that if you change the source_* variables there (or create your own example data module), you need to change here as well. Note also that workorder_raw_db and workorder_table_name references the actual database and table name in RAW where the work order data is stored. So, if you changed the cdf_oid_example_data module default_location to my_location and the source_workorder to my_workorder, the workorder_raw_db would be workorder_my_location_my_workorder.

You also need to change the root_asset_external_id to the root asset of your project. This is the asset that is the parent of all other assets. Finally, you need to configure the group ids from your service provider for the different Infield roles. These will typically be new groups that you create for Infield use. The CDF groups will be created automatically when you deploy the configurations.

Building

Once you have configured what to deploy and changed the variables you need to change, you can build the configurations: cdf-tk build --clean --env=dev

This will substitute the variables in the templates and create a build/ directory with the configurations that will be deployed. The --env=dev specifies to use the config.dev.yaml file that you edited in the previous step.

Deploying

Finally, you can deploy the configurations to your CDF project. Do a test: cdf-tk deploy --dry-run --env=dev

And then you can drop the --dry-run to deploy for real: cdf-tk deploy --env=dev

The deploy command is doing a diff against the CDF project and only deploying what has been changed and by updating the configurations in the CDF project. This ensures that run history, logs, etc will be kept. However, if you want to ensure that you deploy from a clean state, you can clean up configurations before deploying using --drop: cdf-tk deploy --env=dev --drop

You can even add --drop-data to also delete all the data that is managed by the configurations (this is a dangerous operation and similar to clean below): cdf-tk deploy --env=dev --drop --drop-data

If you want to delete everything in your project that is managed by your configurations, you can use (use with caution!): cdf-tk clean --dry-run --env=dev

(and then drop --dry-run when you understand what it's going to do).

Next steps

Once you have tried out the scripts and how to use the templates to deploy to your CDF project, the next step is to set up a CI/CD pipeline where you can deploy to your staging and production environments as part of your development workflow. The advanced documentation explains in more detail how to use a DevOps approach and build modules for your own projects.