Friday 26 April 2013

More complicated data profiling query

This is the extended version of basic-data-profiling, it is slower but gives you much more interesting informations.

When you do data profiling you need to give examples for each data grouping, GROUP_CONCAT MySQL function is here for you

SELECT COUNT(*) AS nb, tablename.title AS datas, 
 SUBSTRING_INDEX(GROUP_CONCAT(tablename.emp_no ORDER BY RAND(42)), ',', 10) AS Examples
FROM titles AS tablename
GROUP BY datas
ORDER BY nb DESC;

Result:
nb datas Examples
115003 Engineer 222309,64167,287849,222887,256836,237309,262823,242428,222680,225055
107391 Staff 427715,209965,36589,74250,407627,251530,409579,254748,456807,204250
97750 Senior Engineer 238256,468681,477369,205608,224208,263874,251767,82661,245168,213359
92853 Senior Staff 83208,45313,211968,264723,36232,263944,46428,471920,66956,442742
15159 Technique Leader 260169,436055,295995,251618,434080,492772,54096,18718,97636,496663
15128 Assistant Engineer 448269,461300,443947,417358,21784,437352,412830,94900,262281,98231
24 Manager 110511,110800,110022,110303,111133,110420,110567,110854,111784,111692


Explainations:
COUNT(*) will count the number of records for each value grouped in the GROUP BY clause
tablename.title AS datas will show in the column "datas" the name of the current group 
GROUP_CONCAT will join the string value of the first arg with coma between them, the max size of group_concat is set by group_concat_max_len variable, default is 1024chars
ORDER BY RAND(42) is the internal order of the group_concat, in this case we order by random with the seed 42, if you rerun the query you will have exactly the same random order if you use the same seed. If you want different examples each time just change the seed or use ORDER BY RAND() instead
SUBSTRING_INDEX is looking for the comma separator and will keep only  everything left of the 10th comma
ORDER by nb DESC will order the final result by the number of occurrence of each data putting the most use data top of the table

No comments:

Post a Comment