Skip to main content

Deprecated: Data Model Examples Toolkit

Deprecation notice

This page is the documentation for the old Data Model Examples Toolkit, which is based on cookiecutter. It is kept here for now as a reference for those who may still be using it.

The Data Model Examples Toolkit was a pre-cursor to the new, more comprehensive CDF Toolkit. Once the new CDF Toolkit has stabilised, the data model examples will be available through the CDF Toolkit, and the old tool described in this page will be deprecated.

If you have been through the data modeling quickstart , you have already been exposed to the CDF toolkit. It is one of the options for loading example data into CDF and quickly get started exploring the capabilities of CDF. As a user of the CDF toolkit, you will typically use cookiecutter like this:

pip install cookiecutter
cookiecutter https://github.com/cognitedata/data-model-examples.git

The beauty of cookiecutter is that it will check out the latest version, including the latest data sets, ask you a set of configuration questions (with defaults) and then generate a project for you that is ready to be used with the CDF project of your choice.

The CDF toolkit can be handy for other things when working with data sets and data models.

Adding a new data set

caution

The actual code for the toolkit can be found in the {{cookiecutter.buildfolder}}/ directory as the root-level is just used for cookiecutter.

If you want to have a data set that you can add and delete and always be sure that it is exactly the same, either for demo purposes or for testing, you can add it to your local version of the CDF toolkit. The documentation on how to add a data set is available in the GitHub repository. It also describes how to contribute a data set that will be checked in and made available globally. Of course, if you only want to have your personal data sets available to the CDF toolkit, you can follow the same recipe.

Backup and restore of the data model

The CDF toolkit can be used to backup and restore the data model. This is useful if you want to move a data model from one project to another, or if you want to have a backup of your data model. The backup uses the low-level API to retrieve the data model and store it in json. These json files can then be used to restore the data model in another project. The backup and restore functionality is available with the dump_datamodel.py and restore_datamodel.py scripts (see CONTRIBUTING)

Using the code for your own purposes

In the {{cookiecutter.buildfolder}}/ folder, you will find the scripts and the default templates that cookiecutter will fill in for you. In the utils/ directory, you will find a set of functions that all take CDFToolConfig as the first argument. The CDFToolConfig class defined in utils.py exposes a CDF SDK client to be used by each of the functions and loads the example dataset configurations from inventory.json. It also has a convenience function verify_client() that will check that the client credentials are valid and that the necessary access rights are available.

This pattern of using a CDFToolConfig class and pass into functions that need to access CDF is a practical way to build your own functionality. It should be fairly easy to copy the code for CDFToolConfig in utils/utils.py and make your own version of CDFToolConfig that creates the CDF client and allows you to verify the client credentials.

def load_raw(ToolGlobals: CDFToolConfig, file: str, drop: bool) -> None:
"""Load raw data from csv files into CDF Raw

Args:
file: name of file to load, if empty load all files
drop: whether to drop existing data
"""
client = ToolGlobals.verify_client(capabilities={"rawAcl": ["READ", "WRITE"]})
# The name of the raw database to create is picked up from the inventory.py file, which
# again is templated with cookiecutter based on the user's input.
raw_db = ToolGlobals.config("raw_db")
if raw_db == "":
print(
f"Could not find raw_db in inventory.py for example {ToolGlobals.example}."
)
ToolGlobals.failed = True
return
try:
if drop:
tables = client.raw.tables.list(raw_db)
if len(tables) > 0:
for table in tables:
client.raw.tables.delete(raw_db, table.name)
client.raw.databases.delete(raw_db)
print(f"Deleted {raw_db} for example {ToolGlobals.example}.")
except:
print(
f"Failed to delete {raw_db} for example {ToolGlobals.example}. It may not exist."
)

As you can see from the above example, ToolGlobals as a CDFToolConfig object is passed into load_raw(). ToolGlobals.verify_client() takes the required capabilities as input and returns a verified client (or throws an exception with correct error messages). ToolGlobals.config("key") retrieves a value from inventory.json. You can then use the client to do operations on CDF.