Tag: Apache Spark

  • Converting event_list And post_event_list In Adobe Analytics Data Feed

    Converting event_list And post_event_list In Adobe Analytics Data Feed

    The event_list and post_event_list columns are heavily encoded and impossible to use directly. Showing how to convert the encoded event_list and post_event_list columns in Adobe Analytics Data Feed into meaningful event names and split them into individual columns using Apache Spark to make the data more interpretable and easier to analyze.

  • User Retention By Days Of Access

    User Retention By Days Of Access

    Instead of the traditional day since the last visit or cohort analysis, calculating user retention based on the number of days users access a portal or app over multiple visits provides a more comprehensive understanding of retention. Using PySpark with Adobe Analytics Data Feed to extract the required data, calculate the weekly and monthly access…

  • Rebuilding Adobe Analytics Full Path Report With Spark

    Rebuilding Adobe Analytics Full Path Report With Spark

    The full path report is missing in the new Analysis Workspace in Adobe Analytics and rebuilding using Apache Spark. However, we can rebuild it using Apache Spark and data from the Adobe Analytics Data Feed, by reading the hit data, filtering valid page names, grouping by visit, ordering the page sequences, removing duplicates if needed,…

  • Setting Up My Spark Lab

    Setting Up My Spark Lab

    When learning Spark and testing with small datasets, I can simply run a local Spark instance with the following command which essentially creates a local Spark instance using all cores. This local instance has no worker and the driver will handle all jobs and tasks. However, it is more interesting and useful to run a…