How to aggregate your data?
Aggregate function allows you to group several measurements and values and create a summary statistics based on these observations. Ideata analytics allows you to perform this operation from the data preparation interface.
Using the left top panel on the data preparation interface, click on the Aggregate button:
Using the "columns to group by" drop down, select the key columns that you would want to group your values by. You can chose as many columns as you want and add them to the list.
From the columns to aggregate drop down, you can select the columns for which you need to group your values by. Ideata supports a number of grouping options:
- Count: Returns the number of values for the selected column for each group
- Distinct Count: Returns the distinct number of values for the selected column for each group
- Max: Returns the largest numerical value present in the selected column for each group. Only supported in columns with numeric data types
- Min: Returns the smallest numerical value present in the selected column for each group. Only supported in columns with numeric data types
- Sum: Returns the total sum of all the values for the selected column for each group. Only supported in columns with numeric data types
- Variance: Returns the total population variance of the values for the selected column for each group. Only supported in columns with numeric data types
- Stddev: Returns the population standard deviation of the values for the selected column for each group. Only supported in columns with numeric data types
- Average: Returns the average for the selected column for each group. Only supported in columns with numeric data types
Once you are done selecting all the required information, click on Save button to save the aggregation result. This will aggregate your data and replace the existing table structure with the aggregated column list.
Please note: Preview data shown during the aggregation process is the aggregated view created only on the sample data. Once you import the data post data preparation, aggregation will run on the full dataset.