AI in Recruitment

I recently listened to an episode of the superdatascience podcast that may be interesting for readers of this blog. It was an interview with Ben Taylor at HireVue, sharing his thoughts and ideas for improving and automating parts of the hiring process.

Ben is working on a machine learning system that ranks candidates based on recorded recruitment interviews. If I understood it correctly, the system encodes both speech and facial expressions from the candidate using state-of-the-art techniques in artificial intelligence (which sounds like deep neural networks to me). The goal is to capture as much predictive information as possible from the interview and process it in an efficient and fair way.

It all sounded very sophisticated, and I was impressed by the amount of effort they must have put in building this entire pipeline of encoding sound and video data and extracting as many as 15,000 features for predictive analytics (according to their website). 

However, I was missing a key part in the discussion; how do they define the target for their predictive models? If their goal is to rank candidates, what is the ranking based on? 

This is actually a very big challenge in the field of people analytics. How you choose to define employee 'success' or 'performance' will determine the usefulness and the limitations of your predictive model. It doesn't matter how advanced the model is; if the target is poorly defined, the model will not produce useful predictions.

Say for instance that a deep neural net is trained to predict manager reviews of employee performance (a common target metric in this field). You use 15,000 features from the interview as input (predictor) variables, including words used, tone of the voice, facial expressivity etc. The model will be able to find non-linear relationships in the data and give accurate predictions. 

But what if the managers were biased? What if they consistently gave lower ratings to employees who had brown eyes for instance? Since we used features from video recordings, there is a great chance that our neural net picked up on this and incorporated the same bias in one of the hidden layers of the model. "Brown eyes" would in effect become a feature that relates to lower scores.

So to summarize; I'm interested in the stuff that HireVue are doing, but I'm not convinced that what they're proposing will necessarily be very accurate or fair in the end. But then again, the alternative is relying on humans to make the judgement...