Getting Started with Notebooks Analysis Features

Join the Preview!

Advanced Analysis is in Preview. To enable, reach out to your Customer Success Manager.

Overview

The workspace datasets

This example analysis notebook has:

  • Three data sources:

    • trade_start_logs
    • trade_execution_logs
    • trading_platform_users
  • Three derived datasets, which are the results of data that has been transformed from filtering, grouping, or querying using SQL:

    • parsed_execution_logs
    • transaction_record
    • transaction_record_with_names
  • One treemap visualization.

This diagram shows the different transformation and analysis cells the data sources go through.

A flowchart showing the steps that the data sources go through

Example walkthrough

The example starts off with two logs data sources:

  • trade_start_logs
  • trade_execution_logs

The next cell in the analysis notebook is the transform cell parsed_execution_logs. It uses the following [grok parsing syntax][3] to extract the transaction ID from the message column of the trade_execution_logs dataset and adds the transaction ID to a new column called transaction_id.

transaction %{notSpace:transaction_id}

An example of the resulting parsed_execution_logs dataset:

timestamphostmessagetransaction_id
May 29 11:09:28.000shopist.internalExecuting trade for transaction 5651956519
May 29 10:59:29.000shopist.internalExecuting trade for transaction 2326923269
May 29 10:58:54.000shopist.internalExecuting trade for transaction 9687096870
May 31 12:20:01.152shopist.internalExecuting trade for transaction 8020780207

The analysis cell transaction_record uses the following SQL command to select specific columns from the trade_start_logs dataset and the trade_execution_logs, renames the status INFO to OK, and then joins the two datasets.

SELECT
    start_logs.timestamp,
    start_logs.customer_id,
    start_logs.transaction_id,
    start_logs.dollar_value,
    CASE
        WHEN executed_logs.status = 'INFO' THEN 'OK'
        ELSE executed_logs.status
    END AS status
FROM
    trade_start_logs AS start_logs
JOIN
    trade_execution_logs AS executed_logs
ON
    start_logs.transaction_id = executed_logs.transaction_id;

An example of the resulting transaction_record dataset:

timestampcustomer_idtransaction_iddollar_valuestatus
May 29 11:09:28.00092446085cc56c-a54f838.32OK
May 29 10:59:29.00078037b1fad476-fd4f479.96OK
May 29 10:58:54.00047694cb23d1a7-c0cb703.71OK
May 31 12:20:01.152802072c75b835-4194386.21ERROR

Then the reference table trading_platform_users is added as a data source:

customer_namecustomer_idaccount_status
Meghan Key92446verified
Anthony Gill78037verified
Tanya Mejia47694verified
Michael Kaiser80207fraudulent

The analysis cell transaction_record_with_names runs the following SQL command to take the customer name and account status from trading_platform_users, appending it as columns, and then joins it with the transaction_records dataset:

SELECT tr.timestamp, tr.customer_id, tpu.customer_name, tpu.account_status, tr.transaction_id, tr.dollar_value, tr.status
FROM transaction_record AS tr
LEFT JOIN trading_platform_users AS tpu ON tr.customer_id = tpu.customer_id;

An example of the resulting transaction_record_with_names dataset:

timestampcustomer_idcustomer_nameaccount_statustransaction_iddollar_valuestatus
May 29 11:09:28.00092446Meghan Keyverified085cc56c-a54f838.32OK
May 29 10:59:29.00078037Anthony Gillverifiedb1fad476-fd4f479.96OK
May 29 10:58:54.00047694Tanya Mejiaverifiedcb23d1a7-c0cb703.71OK
May 31 12:20:01.15280207Michael Kaiserfraudulent2c75b835-4194386.21ERROR

Finally, a treemap visualization cell is created with the transaction_record_with_names dataset filtered for status:error logs and grouped by dollar_value, account_status, and customer_name.

The workspace datasets

Further reading

Additional helpful documentation, links, and articles:

PREVIEWING: esther/docs-10552-computational-notebooks