Skip to main content

Getting started

Beta

The CDF Toolkit (cdf-tk) is currently in beta. It should be stable and mature enough to use it to bootstrap and configure Cognite Data Fusion projects. However, if you are using the tool to manage production projects, we recommend that you test out in a staging project to ensure that you know what the tool is going to do. A part of the beta is to get feedback on the tool and improve how to use the tool for project lifecycle management, so please engage with us on hub.cognite.com.

The cdf-tk tool is available as a Python package. To install it, you need a working Python installation version >=3.9 (recommended 3.11).

Here's a short summary of the command sequence that gets you started. They will be explained further below and on the next page:

StepCommandDescription
1.cdf-tk init <proj_dir>Create a new configuration folder, cd <proj_dir>, and initialise the project.
2.cdf-tk auth verify --interactiveCheck that you have access to the project and create a .env. This step can be skipped if you have configured environment variables. Alternatively do cdf-tk auth verify just to verify that everything works.
3.Edit config.<env>.yaml in <proj_dir>Specify the modules you want to deploy for each environment you deploy to. config.<env>.yaml also contains all variables the modules expect. Change the variables for the modules that are relevant for your deployments.
4.cdf-tk --verbose build --env=devBuild configurations to the build/ directory using the config.dev.yaml configuration file.
5.cdf-tk deploy --dry-run --env=devTest deploy the configurations to your CDF project from the build/ directory. Then remove --dry-run to actually ppush configurations to the project.

Step 1: Install the tool

To install the tool, run the following command in a terminal on your local machine.

pip install cognite-toolkit

Run cdf-tk --version to verify the version of the tool and the templates that come bundled with it. You can also run cdf-tk --help to see the available commands. Each of the commands has comprehensive help accessible using the --help option, e.g. cdf-tk auth verify --help.

tip

If your terminal states command not found and you are using a virtual Python environment manager, make sure you have activated the virtual environment using source .venv/bin/activate, poetry or similar.

If you are struggling with Python version and installation, see the Pro-Tip: Managing Python versions and virtual environments section below.

Step 2: Initialise a project template directory

cdf-tk needs a local directory with configurations. Run this command to create the directory <proj_dir> and populate it with the available template modules.

cdf-tk init <proj_dir>

<proj_dir> will now contain a set of configuration files and template modules that you can pick from to configure your project.

From here on, the easiest is to run all commands from the <proj_dir> directory:

cd <proj_dir>

Step 3: Set up credentials for your project

Before using the cdf-tk tool, you need access to a CDF project. In this section, you will learn how to set up the necessary credentials to access your project.

  1. To use the cdf-tk tool, you need to have a client id and secret representing an application/service principal from the identity provider configured for the CDF project. This must be configured by somebody with administrator access to the CDF project. You can use any Identity Provider like Microsoft Entra ID (aka Active Directory), Auth0, or others supported by CDF. See Setting up an identity provider for more information.

  2. The standard CDF admin group has the below access rights and cdf-tk will help you create the required additional groups and access rights as long as the application/service principal has been granted these access rights (e.g. through being a member of the admin group in the identity provider):

    "projectsAcl": ["LIST", "READ"],
    "groupsAcl": ["LIST", "READ", "CREATE", "UPDATE", "DELETE"]
  3. The information in the table is needed by the cdf-tk tool:

    What?DescriptionEnvironment variable
    Cluster nameThe physical cluster where your CDF project is (e.g. westeurope-1).CDF_CLUSTER
    Project nameThe CDF short project name (e.g. myproject).CDF_PROJECT
    Client idThe client id of the application/service principal you created in your identity provider.IDP_CLIENT_ID
    Client secretThe secret of the application/service principal you created in your identity provider.IDP_CLIENT_SECRET
    Token URLThe token URL of your identity provider.IDP_TOKEN_URL

    Note that if you use Microsoft Entra ID, the tool only needs your tenant id, not the full URL. The tool will then create the full token URL from the tenant id.

  4. Once you have the above information, you can run the following command on a completely empty Cognite Data Fusion project (or see .env.tmpl if you want to fill in the information manually and then run cdf-tk auth verify without the --interactive flag):

cdf-tk auth verify --interactive

In the process, you will be prompted for the necessary information, and you will be asked if you want to store the information to a .env file locally.

caution

Remember that .env files can contain secrets and should never be checked into a version control repository like git. .env is already added to the .gitignore file created in your project directory.

Step 4: Configure your project

Congratulations!

You are now ready to start configuring modules for your project. See Using templates for the next steps.

Extras and Pro-Tips

Extra: Setting up an identity provider

Cognite offers its enterprise customers to have full control of their CDF projects and their data, and each CDF project may thus be configured with a different identity provider that controls access to the project. The identity providers role is to interactively log in users to the Cognite Data Fusion web application and to manage non-human clients (also called applications or service principals) that need to access the CDF project.

If you want to use a new Microsoft Entra ID instance, here are the steps to go through to set up Entra for your project:

  1. Create a new Entra tenant.
  2. Register Cognite API and core application.
  3. Create an Entra CDF full access/admin group for the cdf-tk service principal/application.
  4. Create an Entra application/service principal to be used by the cdf-tk toolkit.
  5. Add the new application/service principal to the admin group you created.

Alternative identity providers

There are slight varitions for each identity provider, but the general steps are the same. You need OAuth2 support, and a token URL that ends in /oauth/token in addition to the client id and secret for the application/service principal set up in the identity provider.

The Identity provider documentation gives a deeper overview of how cdf-tk interacts with the identity provider.

Pro-Tip: Using a token instead of a client id and secret

cdf-tk also supports the CDF_TOKEN environment variable. If you have created an OAuth2 token in some other ways, e.g. using Postman, you can set the token in this variable. Only CDF_CLUSTER and CDF_PROJECT will then be needed.

Pro-Tip: Managing Python versions and virtual environments

Python is sensitive to the dependencies that are installed in your python environment. The cdf-tk tool is built with a minimum of dependencies, but it expects e.g. the Cognite Python SDK to be close to the latest version.

When you have a working global Python version (try python --version, we recommend 3.11.x), we suggest that you install the cdf-tk tool in a virtual environment to ensure that the dependencies are not conflicting with other tools you have installed. Unless you are familiar with a virtual environment manager like poetry, we recommend using pipx.

Install pipx and then run: pipx install cognite-toolkit

This will install the cdf-tk tool in a virtual environment, but still make it available for you as a command line tool without remembering to activate the right virtual environment. If you are comfortable with a virtual environment and package manager like poetry, you don't need pipx but we still recommend using it to install the cdf-tk tool.

If you don't have a working global Python with the right version: Many systems come installed with a system-wide Python installation that is used by default. This may or may not be the right version (3.11 recommended). But instead of trying to upgrade the global version, it is better to install and control additional versions of Python and manage which one to use by using a Python manager like pyenv and for Windows. Once you have a working version, you can install pipx and cdf-tk as described above.