Search The Web

Today's Headlines

Monday, March 29, 2010

Microsoft Access Tips & Tricks: Job Candidate Matching

You may be wondering what job candidates have to do with Microsoft Access. What I am going to talk about in this post is a general type of SQL query. Selecting suitable candidates for a job is a specific application of such a query. This kind of query is also referred to as "Select all X having Y", and other similar names. It matches two sets of entities based on their properties, but the number of properties of the entities is not fixed, and there is no particular order to the properties either.

If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, percentile values, calculating distinct counts, full outer joins, parameter queries, crosstab queries, working with system objects, listing table fields, finding unmatched rows, and calculating statistics with grouping.

The idea behind a Job Candiate Matching query is quite simple. Let us assume that your database contains two tables. One is a table of job candidates called CandidateSkills. The table has the fields "CandidateID", and "SkillID". The table's data contains a list of candidate ID's and ID's corresponding to their various skills.

Note that this table is completely normalized, with separate rows for each candidate's individual skills rather than having a repeating field which limits the number of skills a candidate can have to whatever number of skills fields the table is designed with. Normalization is a separate topic, worthy of several posts to explain how it is done, but avoiding repeating fields is one of the first rules in normalization. For now, we just have to understand that the CandidateSkills table is designed correctly, according to normalization rules.

Obviously, the database would contain a separate table of candidates that would use CandidateID as the primary key, and store details about the candidates such as name, address, etc. Similarly, the database would also have a table of skills that would use SkillID as its primary key and contain data pertaining to each skill (such as the name of the skill, etc.). These extra tables are not important for our discussion in this post.

The data in CandidateSkills may look something like the following:

CandidateID SkillID
1 1
1 2
1 4
1 6
2 1
2 3
2 5
2 7
3 2
3 4
3 6
4 1
4 2
4 3
4 5
4 6
4 7

What the data means is that there are 4 candidates, with ID's 1, 2, 3 and 4. Moreover, candidate 1 has the skills represented by ID's 1, 2, 4 and 6. Similarly, candidate 2 has skills 1, 3, 5 and 7, candidate 3 has skills 2, 4, and 6, and candidate 4 has skills 1, 2, 3, 5, 6 and 7. Even though the data is presented, ordered in ascending order of CandidateID and within that by ascending order of SkillID, there is no such requirement for how the data is entered in the table. The table simply links various candidates with a set of skills, in no particular order.

The other table in our database is a JobSkills table. This table is very similar in structure to the CandidateSkills table. It lists the skills that are required for a list of jobs or job openings. It contains a JobID field and a SkillID field. The SkillID's mean the same in this table as they do in the CandidateSkills table - that is, a SkillID of 1 means the same whether it is a skill possessed by a candidate or a skill required for a job (obviously, SkillID in both tables is a foreign key field that is related to the primary key of a Skills table that has other details about that skill).

The JobID field is a foreign key field that is related to the primary key of a Jobs table that has other details about each job (such as designation, department, etc.). Again, note that the JobSkills table is also designed correctly with no repeating fields, but a separate row for every skill required to do a job.

The contents of the JobSkills Table might look as below:

JobID SkillID
1 1
1 2
1 5
1 6
2 1
2 2
2 3
2 4
3 4
3 5
3 6
3 7
4 1
4 3
4 5
5 2
5 4
5 6
6 1
6 4
6 7

What the data means is that there are 6 jobs, with ID's 1 through 6. Moreover, job 1 requires the skills represented by ID's 1, 2, 5 and 6. Similarly, job 2 requires skills 1, 2, 3 and 4, job 3 requires skills 4, 5, 6, and 7, job 4 requires skills 1, 3, and 5, job 5 requires skills 2, 4, and 6 and job 6 requires skills 1, 4 and 7. Even though the data is presented, ordered in ascending order of JobID and within that by ascending order of SkillID, there is no such requirement for how the data is entered in the table. The table simply links various jobs with a set of skills, in no particular order.

So, the two entities in our matching problem here are jobs and candidates. Candidates have skills, and jobs have requirements. We need to find candidates for each job such that every matching candidate for a job has all the skills required for the job. The candidate may have additional skills, but the candidate should not have any skills missing from the list required for doing the job.

As you can imagine, this kind of problem is quite common in Human Resources departments of companies, for instance. They usually have a database that lists candidates and jobs, and whenever they want to hire for a particular job, they may search for candidates that match the job's requirements. Other applications of this type of application are manufacturing facilities. Such facilities may have a set of parts required for each item they manufacture, and an inventory system that shows them which factory has which parts. They can then match the two and determine which factory should manufacture which items. Such matching problems are quite common, and we solve such problems without thinking about them very often (for instance, selecting a TV to buy based on the features we want in a TV is a similar exercise even though we may not think about using Access to solve the problem using a matching query!).

Since the problem, as presented here, is quite limited in scope because of the numbers of jobs and candidates involved, we will be able to do the matching mentally without any automation. The key to doing this is to take each job, list out its required skills, and then identify candidates that have all of those skills. Thus, for instance JobID 1 requires skills 1, 2, 5 and 6. CandidateID 1 does not match this job because he lacks skill 5. Candiates 2 and 3 are also eliminated because candidate 2 does not have skills 2 and 6, while candidate 3 does not have skills 1 and 5. We find that candidate 4 has all the skills required by job 1, and is therefore a match. Our final output would therefore have one row with JobID of 1 and CandidateID of 4. What we want is a full table that lists all such matches.

Even though it easy to do the matching easily when the number of jobs and candidates, and skills is small, the problem can become very messy when these numbers become higher. Imagine a typical company that may have a dozen or more job openings, thousands of candidates, and perhaps a list of 20 or 30 different skills to match against. That is the prime motivation behind the need for automation of such a system.

The basic idea behind an SQL query to solve the problem is quite simple. This is simply an application of unmatched queries. For each job-candidate combination, we want the unmatched query to find out if there are job requirements that the candidate does not have. If so, that combination of job and candidate is not a good match. If no such "unmatched" requirements exist, that means that the candidate has all the skills required for the job. Therefore, that combination is a good match.

Using the insight above, let us build the query step by step. The unmatched query that we need to verify a good match is shown below:

select SkillID from JobSkills where JobID = 1 and SkillID not in
(select SkillID from CandidateSkill where CandidateID = 1)

This query is obviously for checking whether candidate 1 is a good match for job 1. If this query returns no rows, then candidate 1 is a good match for job 1. If it returns rows, then the match is not good.

This query should now be used as a subquery inside an outer query that selects each combination of job and candidate and checks each for a match. A query that selects all possible combinations from two tables is a simple cartesian join between the two tables. Thus, we could write the outer query as below:

select CandidateID, JobID from CandidateSkills, JobSkills

Now, we have to combine this outer query with the previously written subquery. Moreover, we need to write out only those rows from the outer query where the subquery returns zero rows. One way to check whether a query returns rows is to return a count() from it and check for a value of zero. However, a more elegant way to do this is to use an EXISTS clause for the subquery and check whether any rows exist in the subquery. This technique is shown below:

select CandidateID, JobID from CandidateSkills as CS, JobSkills as JS
where NOT EXISTS
(select SkillID from JobSkills where JobID = JS.JobID and SkillID not in
(select SkillID from CandidateSkill where CandidateID = CS.CandidateID))

As you can see, we have aliased the tables in the outer query, and then used the aliases in the subquery to replace the hard-coded JobID of 1 and CandidateID of 1. We selected only SkillID in the subquery, but we can select any fields we want. An EXISTS subquery only cares about whether the query returns any rows or not. It does not care about the fields returned or the number of fields returned. So, we could write the above query with the subquery being a "select *" instead of just being a "select SkillID".

So, we now have an outer query that creates every possible combination of jobs and candidates using a cartesian join between the two tables. Each combination of job and candidate is then passed on to the subquery that checks if the job has any "unmatched" requirements that the candidate's skills do not cover. If no such skills exist, then the subquery returns no rows, resulting in the EXISTS clause becoming false. Since we use a "NOT EXISTS" as the WHERE clause of the outer query, whenever the EXISTS clause becomes false, the WHERE clause becomes true, and the row is printed out. Thus, we get a final result set that consists of one row for each candidate-job match. All the candidate-job combinations that are not good matches get eliminated by the WHERE clause because the EXISTS clause is true for each such combination. I hope I haven't confused more in the process of trying to explain more clearly, but I am sure you get the idea!

One can always modify the query easily if we need candidate matches for only one job or a subset of jobs, or alternatively, job matches for one or a subset of candidates. The simplest way to do this is to add a WHERE clause to the outer query. But if you know the CandidateID of a single candidate for whom you are looking for job matches, the following query would work too (in the query below, we will assume that the CandidateID of the given candidate is X):

select X as CandidateID, JobID from JobSkills as JS
where NOT EXISTS
(select SkillID from JobSkills where JobID = JS.JobID and SkillID not in
(select SkillID from CandidateSkill where CandidateID = X))

Similarly, if the JobID is X, and we are interested in finding candidates who are a good match for this single job, we could use a query like the one below:

select CandidateID, X as JobID from CandidateSkills as CS
where NOT EXISTS
(select SkillID from JobSkills where JobID = X and SkillID not in
(select SkillID from CandidateSkill where CandidateID = CS.CandidateID))

Now, as mentioned earlier, the subquery in the above queries is an unmatched query. It simply finds elements in JobSkills that don't have a match in CandidateSkills. However, as we saw in the post on unmatched queries, unmatched queries can be written the way we wrote it here, or by using a left or right join. If you are in the mood to experiment, you might want to try rewriting the subqueries in the above queries using a left or right join.

I will warn you right now though: Access will complain loudly and will not allow you to create a query without a syntax error. The Access SQL interpreter seems to have a problem with a query involving a join being used as a correlated subquery (we need a correlated subquery because we want the subquery to work specifically with the JobID and CandidateID selected in the outer query). This is one more reason why it is important to know several ways to accomplish the same thing in SQL. When you run into problems with one approach, you might be able to salvage the situation by trying a different method to get the same results.

Hope this post has been helpful in solving any problems you might have had with job-candidate-matching type problems in Access. If you have any problems or concerns with the SQL in this lesson, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future lessons, please feel free to let me know through your comments too. Good luck!

No comments:

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web