Counting Events Across Multiple Columns Without Full Joins or Concatenation

Joining Multiple Counts on the Same Table, From Different Columns?

As a data analyst or developer working with relational databases, you often encounter scenarios where you need to aggregate data from multiple columns and join them based on certain conditions. In this blog post, we’ll explore one such scenario where you want to count the number of events each staff member worked, considering different roles like barman, doorman, cloak room attendant, and keg room attendant.

Background

The provided Stack Overflow question illustrates a common problem in data analysis: aggregating multiple counts from different columns without performing full joins or using string concatenation. The goal is to create a table with the count of events each staff member worked, taking into account their roles in various events.

Understanding the Data Structure

Let’s take a closer look at the provided table structure:

+----------+----------+-------+------------+----------+
| event_id | bar      | doors | cloak_room | keg_room |
+----------+----------+-------+------------+----------+
| 2        | bob      | bill  | john       | mary     |
+----------+----------+-------+------------+----------+
| 3        | bob      | bill  | mary       | kev      |
+----------+----------+-------+------------+----------+
| 4        | bob      | john  | louise     | mary     |
+----------+----------+-------+------------+----------+
| 5        | kyle     | kev   | sarah      | louise   |
+----------+----------+-------+------------+----------+
| 6        | jennifer | bob   | jay        | john     |
+----------+----------+-------+------------+----------+
| 7        | john     | bill  | mary       | steve    |
+----------+----------+-------+------------+----------+ 
</code></pre>

The table has four columns: `event_id`, `bar`, `doors`, and `cloak_room` (and `keg_room`). Each row represents an event, with staff members assigned to different roles. The goal is to create a new table with the count of events each staff member worked.

## The Challenge

The main challenge here is to aggregate data from multiple columns without performing full joins or using string concatenation. We need to find an elegant solution that works for all columns, considering different roles and event associations.

## Solution 1: Rethinking the Data Structure

One approach is to restructure the data so that each staff member has a separate row per event, with their role included:
```markdown
+----------+-------+--------+
| event_id | person | job     |
+----------+-------+--------+
| 2        | bob    | barman  |
+----------+-------+--------+
| 2        | bob    | doorman |
+----------+-------+--------+
| 3        | bob    | barman  |
+----------+-------+--------+
...

This way, we can use aggregation to count the number of events each staff member worked.

Solution 2: Using a Query

Another approach is to use a query that combines multiple column counts using union all and aggregation:

SELECT who, COUNT(*)
FROM (
  SELECT event_id, 'bar' as job, bar as who FROM t
  UNION ALL
  SELECT event_id, 'doors' as job, doors as who FROM t
  UNION ALL
  SELECT event_id, 'cloak_room' as job, cloak_room as who FROM t
  UNION ALL
  SELECT event_id, 'keg_room' as job, keg_room as who FROM t
) jw
GROUP BY who;

This query creates a temporary result set with the combined column counts and then groups the results by staff member (who).

Solution 3: Using Lateral Joins (Oracle 12c)

For Oracle 12c users, we can use lateral joins to achieve similar results:

SELECT who, COUNT(*)
FROM t CROSS APPLY (
  SELECT event_id, 'bar' as job, bar as who FROM dual UNION ALL
  SELECT event_id, 'doors' as job, doors as who FROM dual UNION ALL
  SELECT event_id, 'cloak_room' as job, cloak_room as who FROM dual UNION ALL
  SELECT event_id, 'keg_room' as job, keg_room as who FROM dual
) jw
GROUP BY who;

This query uses a lateral join to combine the column counts and then groups the results by staff member.

Conclusion

Joining multiple counts on the same table from different columns can be achieved using various approaches. By rethinking the data structure, combining column counts with aggregation, or using lateral joins (specifically in Oracle 12c), we can create an elegant solution that works for all columns and roles. The choice of approach depends on the specific requirements and database management system being used.


Last modified on 2023-09-09