Dataiku DSS Execution Engine Best Practices

22 Abr 2026 Dataiku DSS Execution Engine Best Practices

Posted at 07:00h in Enterprise BI, ETL by Jose L.S

Dataiku DSS enables teams to design and automate end-to-end data workflows through a collaborative interface with limited coding effort. One of the most important decisions when building a pipeline is choosing the right execution engine. Dataiku offers several options, including its native DSS engine and external SQL engines, each with different trade-offs in performance, scalability, and resource usage. This article outlines how to choose between them and where a hybrid approach makes sense.

Understanding Dataiku Execution Engines

DSS Engine

The DSS engine runs computations directly in memory or on the application server, making it ideal for fast experimentation and smaller datasets.

How does it work?

When you use the DSS engine, Dataiku processes data directly within its environment. This means transformations, formulas, and even Python or R code are executed on the DSS server rather than being pushed to external systems.

Advantages

Flexibility: Supports complex transformations, custom Python or R code, and visual recipes.
Ease of use: No need to write SQL, making it accessible for non-technical users.
Transparency: Errors and logs are clearly visible within the platform, making debugging easier.
Rapid prototyping: Well-suited to testing ideas and building workflows quickly.

Limitations

Scalability: Performance can degrade with large datasets due to memory and CPU constraints.
Resource usage: Consumes DSS server resources, which may affect other users or processes.
Not optimised for big data: For large-scale operations, external engines like SQL or Spark are typically more efficient.

Error Handling

One of the advantages of the DSS engine is its transparent and customisable error handling. When a step fails, Dataiku provides detailed logs and error messages directly in the interface, often making it easier to trace the issue to a specific transformation or script. By contrast, with the SQL engine, errors are reported by the database itself, and messages can be vaguer or more technical depending on the SQL dialect and configuration. This can make debugging more difficult, particularly for users with limited SQL knowledge.

SQL Engine

With the SQL engine, transformations are delegated to the connected database, using its own optimisation and indexing.

How does it work?

When you select the SQL engine for a recipe, Dataiku translates the visual steps or formulas into SQL code. The code is then sent to the database, which executes the operations and returns the results. This approach is particularly effective when working with large datasets stored in relational databases like PostgreSQL, Oracle, or Snowflake.

Advantages

Performance: Executes directly on the database, which is optimised for handling large volumes of data.
Scalability: Benefits from the database infrastructure, including parallel processing and indexing.
Efficiency: Reduces data movement by keeping operations close to the data source.
Integration: Works well with enterprise data warehouses and cloud databases.

Limitations

Transformation constraints: Limited to operations that can be expressed in SQL, so complex logic may not be supported.
SQL dialect sensitivity: SQL syntax varies across databases, which can lead to compatibility issues.
Requires SQL knowledge: Users may need a working understanding of SQL to troubleshoot or customise queries.

DSS Engine vs SQL Engine

The table below compares the DSS and SQL engines across the main decision points:

Feature	DSS Engine	SQL Engine
Execution location	Local, in memory or on the DSS server	Push-down to the database through SQL execution
Performance	Good for small to medium datasets	Optimised for large datasets
Scalability	Limited by server memory and CPU	Scales with database infrastructure
Transformation flexibility	Supports complex logic, Python, and custom code	Limited to SQL-compatible transformations
Ease of use	Intuitive, with no SQL knowledge required	Requires an understanding of SQL and database structure
Resource usage	Consumes DSS server resources	Uses database resources
Error handling	Errors shown clearly in DSS, usually easier to trace and debug	Depends on database messages, which may be more technical or less transparent
Best use cases	Data preparation, prototyping, and small workflows	Large datasets, production pipelines, and joins
Limitations	Slower on large datasets and more memory-intensive	Less flexible, with potential SQL dialect issues

Let’s look at a few practical scenarios that show where each engine performs best.

Use Cases and Best Practices

Case 1: Operations Across Different Databases or Input-Output Types

Scenario

You need to perform transformations or joins between datasets stored in different databases, for example by combining data from a Snowflake warehouse with an SQL Server operational database. The same issue also arises when the output differs from the input. For instance, you might join two datasets from the same SQL database, but write the result to a file system. In this case, all recipes run in DSS because the inputs and outputs differ, so the SQL engine cannot be used.

Figure 1: Use case 1 – different input and output sources

Challenge

Dataiku’s SQL engine executes recipes by pushing computations to the database. However, when datasets come from different databases, SQL push-down is not possible because no single database can execute queries across both sources.

Why the DSS Engine is Used

The SQL engine cannot operate across multiple databases.
The DSS engine can handle the logic in memory or on the DSS server.
It provides the flexibility to join, filter, and transform data from different sources.

Best Practice

Where possible, consolidate the data into a single database before performing joins. For example:

Copy the relevant tables from DWH_1 to DWH_2 within the same database environment.
Use the SQL engine for downstream operations once the data has been unified.
In some cases, the DSS engine remains the only option, especially when data must be copied between different databases, for example when a master table is needed in the UAT database.

Case 2: Replacing SQL Queries with Direct Table Access

Scenario

In many workflows, users create SQL recipes with custom queries to extract or transform data. However, in a well-structured data warehouse, it is often more efficient and easier to maintain to replace these SQL queries with direct table access.

Figure 2: Use case 2 – an SQL query instead of reading the table

Challenge

Custom SQL queries can be harder to maintain, less transparent, and more limiting from an optimisation perspective. They can also make workflows more complex to debug and scale.

Best Practice

Instead of writing an SQL query inside a recipe:

Connect directly to the source table in the data warehouse. There is no need to connect every table referenced in the query, only the relevant connection is required.
Use an SQL table dataset in Dataiku rather than a query-based dataset.
Then use an SQL recipe to perform transformations on that table.

This approach ensures:

Better performance through full SQL engine push-down.
Easier debugging and monitoring.
Improved clarity and maintainability across the flow.

Example

Instead of:

SQL: SELECT * FROM TABLE_01 WHERE year = 2025

Use:

An SQL table dataset pointing to TABLE_01.
An SQL recipe that applies the filter year = 2025.

This makes the flow more transparent and allows Dataiku to optimise execution more effectively.

Case 3: Complex Logic Not Supported by SQL

Scenario

You’re building a data pipeline that includes advanced transformations, such as custom formulas, date parsing, or logic that cannot be expressed in SQL.

Figure 3: Use case 3 – logic not supported by SQL

Challenge

While the SQL engine is a great fit for standard operations, it has its limitations. Certain transformations, particularly those involving the following, cannot be translated into SQL:

Custom formulas (for example, IF, REGEX, or CASE with complex logic).
Date parsing (for example, converting strings to dates using locale-specific formats).
Text manipulation beyond regular SQL capabilities.
More than 10 CASE expressions in a DSS formula combined with concatenated IF logic.
Python or R code snippets.

In these situations, attempting to use the SQL engine will either fail or trigger an automatic fallback to the DSS engine.

Best Practice

Use the SQL engine for all operations that can be expressed in SQL.
Reserve the DSS engine for steps that require:
- Advanced formulas.
- Date and time parsing.
- Python or R code.
Split workflows: Use the SQL engine for heavy processing, then switch to DSS for the final, more complex steps.

Case 4: When Prepare Recipe Joins Cannot be Pushed to SQL

Scenario

You’re using a Prepare recipe in Dataiku to clean and transform data, and you include a join clause within that recipe, either to enrich the dataset or to apply conditional logic based on another dataset.

Figure 4: Use case 4 – join clause in Prepare recipe

Challenge

Although joins are supported in SQL, joins defined inside a Prepare recipe are not translatable into SQL by Dataiku. Prepare recipes are designed for row-level transformations and are not inherently optimised for multi-table operations like joins.

Best Practice

Avoid using joins inside Prepare recipes when working with large datasets.
Instead, use a Join recipe explicitly or an SQL recipe, which can be pushed down to the SQL engine when both datasets are in the same database.
Reserve Prepare recipes for row-level transformations like:
- Text cleaning.
- Date parsing.
- Conditional formulas.

When SQL is Not Enough

Pushing computation to the SQL engine is often the most efficient approach, but not every transformation can, or should, be translated into SQL. The key is to understand the trade-offs and choose the right execution engine for the use case at hand.

Evaluate the Benefits and Limitations

The SQL engine offers performance and scalability, but it comes with constraints on transformation complexity.
The DSS engine provides greater flexibility and supports more advanced logic, but it consumes local resources and may not scale as well for large datasets.
A hybrid approach, using SQL for heavy processing and DSS for more complex logic, can often provide the best balance.

Be Aware of Functional Differences

Some functions behave differently in DSS and SQL, which can lead to unexpected results if not carefully managed. For example:

Date parsing: SQL engines may not support locale-specific or flexible date formats, which may require the use of DSS’s Parse Date step.
String manipulation: Complex regex or conditional logic may not be supported in SQL and may therefore require execution in DSS.

Conclusion

Choosing the right engine is not just a technical detail: it has a direct impact on performance, scalability, maintainability, and the reliability of your data pipelines. Understanding where SQL performs well, where DSS adds value, and where a hybrid approach makes sense helps teams to design efficient and effective workflows.

If you’re reviewing your Dataiku workflows, ClearPeaks can help you to assess where SQL push-down makes sense, where DSS execution is the better option, and how to structure pipelines for performance, maintainability, and scale. Get in touch with our experts to discuss your use case.

Jose L.S

jose.luis@clearpeaks.com

22 Abr 2026 Dataiku DSS Execution Engine Best Practices

Understanding Dataiku Execution Engines

DSS Engine

SQL Engine

DSS Engine vs SQL Engine

Use Cases and Best Practices

Case 1: Operations Across Different Databases or Input-Output Types

Case 2: Replacing SQL Queries with Direct Table Access

Case 3: Complex Logic Not Supported by SQL

Case 4: When Prepare Recipe Joins Cannot be Pushed to SQL

When SQL is Not Enough

Evaluate the Benefits and Limitations

Be Aware of Functional Differences

Conclusion

Jose L.S

Otros Links

De Nuestro Blog