Python, Data Science, Life

This page is under construction.

You have lots of skills, too many to fit on a one-page resume.
We develop skills over time, usually lots of them. Unfortunately, hiring managers don't have the time, energy, or need to hear about each and every skill you've developed over your lifetime. Standard advice for job-seekers is to keep one's resume length fixed at one page. When it comes to choosing which skills to list on a resume it is best practice to list the skills which one posesses that are related to the position. This step includes reading the job description and identifying skills one has that would support this role.

OK. I've read the description, reduced my margins, and am using a 10pt font. What else?
With the easy steps taken care of, it's time to work for it. Our strategy will be to, given corpus a of similar job postings, suggest skills which are not in the description but are on similar listings. Here's the process:

Search jobsite for the target role with requests
Parse search results to find URLs of all job description HTML documents with BeautifulSoup
Download and cache job description HTML
Extract job description from HTML
Extract features from job description with NLTK
Calculate similarity between target job-post and every other cached post
Right outer join the target job-posting with every cached listing above a similarity threshold

Search Jobsite for target role
We're not going to use the web front-end for this. That would take too much time. Instead we are going to use Python's requests module to navigate. Since it is a violation of the terms of service for many jobsites to download any content, I will use a fictitious the jobsite "infact.com" to demonstrate. An example URL for a Data Scientist role in New York, NY could look like:

#!/usr/bin/env python3 ROLE = 'Data Scientist' LOCATION = 'New York, NY' URL = f'https://www.infact.com/jobs/?q={ROLE}&l={LOCATION}&sort=date'

Now we use the requests module to fetch the HTML from infact.com's server:

import requests response = requests.get(URL) HTML = response.text()

Since infact.com is completely fictitious, we'll imagine to have an HTML document which contains the job listings for the role and location queried sorted by date posted. Further, we can imagine this data is provided in a systematic way which links each listing to another page with a full job description. An excellent way to navigate through HTML tags is with the BeautifulSoup module. Let's make a guess that on infact.com, the job content is loaded dynamically through JavaScript. Assume that inside a <script> tag each job-listing is contained inside a dictionary-like object called 'jobmap.'

import re #for regular expressions from bs4 import BeautifulSoup soup = BeautifulSoup(HTML, 'html.parser') jobmap = soup.find(text=re.compile('jobmap'))

Parse search results to find URLs of all job description HTML documents
The jobmap proably contains lots of data like a unique identifier of the role to be filled, the employer, the date posted, etc.. Let's call the unique identifier a jobkey or jk for short. Although not incredibly challenging, this part can get pretty hacky. With some looping, logic, and string functions, I leave it to the reader to extract the jobkeys for each job-posting. Be creative :)

With a list or dictionary of jobkeys, we can use the requests module to fetch the HTML document which contains the full job-description. Let's assume we have a dictionary of the following format:


	
		 jk
		1 038eef0fed4b586d
		2 96a3d424eb8855e1
		3 faf9d0ffb3232d4f
		... ...
		N ffffffffffffffff

Download and cache all job description HTML documents
The HTTP-request part will closely the format we used earlier. What is different is new URLs are programatically created for each jobkey. Also the results are being cached on the disk.

Nota bene: Caching page-content is often a violation of a webpage's TOS. Check in with the host's TOS and with your own morality before caching.

BASE_URL = 'https://www.infact.com/viewjob?jk=' for jk in jobmap['jk']: filename = './cache/'+jk if os.path.isfile(filename): continue else: URL = BASE_URL + jk response = request.get(URL) with open(filename,'w') as f: f.write(response.text)

	jk
1	038eef0fed4b586d
2	96a3d424eb8855e1
3	faf9d0ffb3232d4f
...	...
N	ffffffffffffffff

How to determine which skills to put on a resume