When you do data profiling you need to give examples for each data grouping, GROUP_CONCAT MySQL function is here for you
SELECT COUNT(*) AS nb, tablename.title AS datas, SUBSTRING_INDEX(GROUP_CONCAT(tablename.emp_no ORDER BY RAND(42)), ',', 10) AS Examples FROM titles AS tablename GROUP BY datas ORDER BY nb DESC;
Result:
nb | datas | Examples |
---|---|---|
115003 | Engineer | 222309,64167,287849,222887,256836,237309,262823,242428,222680,225055 |
107391 | Staff | 427715,209965,36589,74250,407627,251530,409579,254748,456807,204250 |
97750 | Senior Engineer | 238256,468681,477369,205608,224208,263874,251767,82661,245168,213359 |
92853 | Senior Staff | 83208,45313,211968,264723,36232,263944,46428,471920,66956,442742 |
15159 | Technique Leader | 260169,436055,295995,251618,434080,492772,54096,18718,97636,496663 |
15128 | Assistant Engineer | 448269,461300,443947,417358,21784,437352,412830,94900,262281,98231 |
24 | Manager | 110511,110800,110022,110303,111133,110420,110567,110854,111784,111692 |
Explainations:
COUNT(*) will count the number of records for each value grouped in the GROUP BY clause
tablename.title AS datas will show in the column "datas" the name of the current group
GROUP_CONCAT will join the string value of the first arg with coma between them, the max size of group_concat is set by group_concat_max_len variable, default is 1024chars
ORDER BY RAND(42) is the internal order of the group_concat, in this case we order by random with the seed 42, if you rerun the query you will have exactly the same random order if you use the same seed. If you want different examples each time just change the seed or use ORDER BY RAND() instead
SUBSTRING_INDEX is looking for the comma separator and will keep only everything left of the 10th comma
ORDER by nb DESC will order the final result by the number of occurrence of each data putting the most use data top of the table
SUBSTRING_INDEX is looking for the comma separator and will keep only everything left of the 10th comma
ORDER by nb DESC will order the final result by the number of occurrence of each data putting the most use data top of the table