The CDF Toolkit (
cdf-tk) is currently in beta. It should be stable and mature enough to use it to
bootstrap and configure Cognite Data Fusion projects. However, if you are using the tool to manage production
projects, we recommend that you test out in a staging project to ensure that you know what the tool is going to
do. A part of the beta is to get feedback on the tool and improve how to use the tool for project
lifecycle management, so please engage with us on hub.cognite.com.
This section covers the steps 3-5. For steps 1-2, see Getting started.
cdf-tk init <proj_dir>
|Create a new configuration folder,
cd <proj_dir>, and initialise the project.
cdf-tk auth verify --interactive
|Check that you have access to the project and create a
.env. This step can be skipped if you have configured environment variables. Alternatively do
cdf-tk auth verify just to verify that everything works.
|Specify the modules you want to deploy for each environment you deploy to.
config.<env>.yaml also contains all variables the modules expect. Change the variables for the modules that are relevant for your deployments.
cdf-tk --verbose build --env=dev
|Build configurations to the build/ directory using the
config.dev.yaml configuration file.
cdf-tk deploy --dry-run --env=dev
|Test deploy the configurations to your CDF project from the build/ directory. Then remove
--dry-run to actually push configurations to the project.
When you are starting out, your Cognite Data Fusion (CDF) project is empty. You need to set up the project with a core structure that fits with how you want to work with CDF. Depending on your industry and use case, you may want tailored data models, and the systems where your existing data resides will have different naming schemes and structures, so you need to configure CDF to extract, transform, and contextualise your data.
Although CDF is a flexible platform that can be configured to fit most use cases, there are some common patterns and use cases that are useful to have as a starting point. The CDF Toolkit comes with a set of templates that you can use to get started with your project.
What is a template?
A template is a set of configurations that can be deployed to a CDF project through Cognite's open APIs. You don't need to know the APIs to use the templates, but if you create your own configurations, it is useful to know that they mirror the APIs. Technically, the templates is a set of yaml-formatted files that follow the CDF API specifications and thus allows you to textually describe how the CDF project should be set up.
When we refer to templates, we mean the configuration sets that
are quality assured and come bundled with the
cdf-tk tool. To start using the templates, you do
cdf-tk init <folder> to
create a new local project folder with the templates pre-installed. You then edit configuration variables that tailor
the templates to your project, and you can add new configurations for your project. You can also modify and adapt the
templates to fit your project.
Modules and packages
The simplest possible template is a single yaml text file that configures a small, simple thing in a CDF project, like a
group or a data set.
Then, a group of such yaml files can be put together into a module. A module is a bundle of CDF configurations that
logically belong together, are deployed together, and that gives you a certain functionality in your CDF project. For
example, the Infield application needs a set
of configurations that are shared with other Asset Performance Management (APM) use cases and applications, including
the APM data model.
These configurations are bundled and found in the
A package is just a list of modules that are deployed together in a specific order. For example, the
package will give you all the modules necessary for Infield to work.
The pre-installed templates are found as modules in the
directories below the
cognite_modules directory in your project direcotry. You are free to edit the configurations in
these modules (or copy them to
custom_modules/), but if you do not edit them, you get
the benefit of being able to do
cdf-tk init --upgrade to get the latest version of the templates installed into your project.
The basic flow is as follows:
First you build the templates to resolve variables (as defined in
config.<env>.yaml) and gather
the modules that should be deployed. Then you deploy what was built to the CDF project environment of choice:
Configuring what to deploy
This step describes how to onfigure what to deploy to each of your project environments. These are configured in the
config.<env>.yaml files found in the root of your project directory.
<env> is the name of the environment you want to
manage. Default, two environments are created:
prod. You can create any number of environments you want by
config.<env>.yaml file and changing the
<env> in the file name and editing the
If you want to configure the
dev environment, you edit the
config.dev.yaml in the root of the project directory you created with
cdf-tk init <folder>.
This file is the starting point for how your project is configured. It defines a set of environments, and for each
environment, it defines which modules to deploy.
Here is a snippet of the
config.dev.yaml file that defines the environment:
project property to match the name of your project. This is used as a safety measure to ensure that you don't
accidentally deploy to the wrong project. The
type property is used to distinguish between different types of
environments, and is not currently used by the
cdf-tk tool (but will be used to support migrations in the future).
selected_modules_and_packages property is a list of modules to deploy. The module can be found in any of the
module directories below the cognite_modules and custom_modules directories.
common_function_code property is the path to a directory where you can put common code that is used by
your functions. The default code found in
common_function_code is used to support local execution of functions.
Changing the default variables
Each module has variables you may want to change to adapt to your project. It may be the name of your default location
(i.e. plant/asset/site), or other things. The configuration variables can be found in the same
as the environment configuration, but further below. They are found in the
You are free to delete modules that you don't need, both from the cognite_modules directory and in the
Most of the variables are set to default values that are useful for the example data set that comes with the templates.
You can deploy the configurations as they are without changing these variables, but you probably want to adapt them
to your project. Other variables are set to
<change_me>. If these are not changed, some functionality will not work.
Here are the default modules and how to edit the variables. If you don't use these modules, the below serves as examples for how to configure the modules you use.
These are the group ids from your identity provider. The readwrite_source_id should be the group id of the group you
created for the
The readonly_source_id should be the group id of a group that administrators belong to, so they can use the Fusion UI
and API to read data.
Should not be changed. These will be changed with new versions of the APM data model.
Each of the source_* variables should just be the name of the system where the data originates from. The defaults are here for the example data set that comes with the templates ("Open Industrial Data" or OID).
default_location is the default location (plant/asset/site) that is used in the example data set. Here we
oid, but this should be something short and meaningful to your project, like
Users that are members of this group in your identity provider will be able to configure Infield.
The Infield location module references default the
cdf_oid_example_data module. This means that if you change the
source_* variables there
(or create your own example data module), you need to change here as well. Note also that
references the actual database and table name in RAW where the work order data is stored.
So, if you changed the
cdf_oid_example_data module default_location to
my_location and the source_workorder to
my_workorder, the workorder_raw_db would be
You also need to change the root_asset_external_id to the root asset of your project. This is the asset that is the parent of all other assets. Finally, you need to configure the group ids from your service provider for the different Infield roles. These will typically be new groups that you create for Infield use. The CDF groups will be created automatically when you deploy the configurations.
Once you have configured what to deploy and changed the variables you need to change, you can build the configurations:
cdf-tk build --clean --env=dev
This will substitute the variables in the templates and create a
build/ directory with the configurations that will
be deployed. The
--env=dev specifies to use the
config.dev.yaml file that you edited in the previous step.
Finally, you can deploy the configurations to your CDF project. Do a test:
cdf-tk deploy --dry-run --env=dev
And then you can drop the
--dry-run to deploy for real:
cdf-tk deploy --env=dev
The deploy command is doing a diff against the CDF project and only deploying what has been changed and by
updating the configurations in the CDF project. This ensures that run history, logs, etc will be kept.
However, if you want to ensure that you deploy from a clean state, you can clean up configurations before deploying
cdf-tk deploy --env=dev --drop
You can even add
--drop-data to also delete all the data that is managed by the configurations (this is a dangerous
operation and similar to
cdf-tk deploy --env=dev --drop --drop-data
If you want to delete everything in your project that is managed by your configurations, you can use (use with caution!):
cdf-tk clean --dry-run --env=dev
(and then drop
--dry-run when you understand what it's going to do).
Once you have tried out the scripts and how to use the templates to deploy to your CDF project, the next step is to set up a CI/CD pipeline where you can deploy to your staging and production environments as part of your development workflow. The advanced documentation explains in more detail how to use a DevOps approach and build modules for your own projects.