Simple Steps to Find out Job Satisfaction from Stack Overflow Annual Developer Survey 2020

Cheng
4 min readDec 15, 2020
Photo by Iva Rajović on Unsplash

Hello!

Today I would like to share some of the insights I found after analyzing the dataset taken from Stack Overflow Annual Developer Survey 2020.

The link to the survey: https://insights.stackoverflow.com/survey

The steps are simple and easy to follow!

Eventually our goal is to build a comparative graph. A graph is always the best way to communicate with the audience who are not very familiar with the original dataset.

Now, let’s begin!

First, let’s see what information is included in the dataset. If you have downloaded the zip file through the link, you will find two CSV files and one PDF file. In this project, I am using the survey_results_public.csv (90 MB).

As described on the Stack Overflow website, this survey includes:

“ With nearly 65,000 responses fielded from over 180 countries and dependent territories, our 2020 Annual Developer Survey examines all aspects of the developer experience from career satisfaction and job search to education and opinions on open source software. ”

The dataset has 64461 rows × 61 columns. It provides a good start for any data science project whether we want to build a machine learning model or just want to implement EDA. Besides, the PDF file also contains the original survey questions that will help us better understand the data.

Second, we need to define two functions which are quite useful in the upcoming steps

def preprocess(df):

df = df[df['Neither satisfied nor dissatisfied'] != 1]
df['Satisfaction'] = df['Very satisfied'] + df['Slightly satisfied'] df['Dissatisfaction'] = df['Very dissatisfied'] + df['Slightly dissatisfied'] df = df.drop(['Very satisfied', 'Slightly satisfied', 'Very dissatisfied', 'Slightly dissatisfied', 'Neither satisfied nor dissatisfied'], 1) df = df[(df.Satisfaction + df.Dissatisfaction) != 0] df = pd.DataFrame(df.drop(['Dissatisfaction'],1))

df = df.groupby(["Satisfaction"]).sum()

df = pd.DataFrame(df).reset_index()

return df
def plot(df): df = df.iloc[:,1:] index = list(set(df.columns))

Sat = df.iloc[1]
DisSat = df.iloc[0]

df = pd.DataFrame({'Satisfaction': Sat,
'Dissatisfaction': DisSat}, index=index)
ax = df.plot.bar(stacked=True)

The function preprocess takes a dummy variable dataset as input and out a dataset like below

In order to have the ideal input dummy variable dataset, I copied the code below to make sure you have the correct form of input.

df = pd.read_csv('2020.csv') # read in dataset from csv file
# df = pd.DataFrame(df) # one extra step to change data type
feature = df['feature'] # select the feature from original dataset
feature = pd.DataFrame(feature) # one extra step to change data type
dummy_feature = feature['feature'].str.get_dummies(sep=';')
# This is the important step to get the dummy variables, sep=';' makes sure that we successfully extract features separated by ; under any column.
dummy_feature = feature['feature'].str.get_dummies()
# For features not separated by ; under a column.
dummy_feature.sum()

If success, you should see things like below ( I use the column DevType for an example).

Only additional step before applying the preprocess function on dummy_feature.sum() is to make sure the input data type is a dataset.

DevType = pd.concat([dummy_DevType, dummy_JobSat], axis=1) 
# dummy_JobSat is just another dummy dataset if you follow the previous steps :)

Now, you are ready to run the preprocess function! Simply by running the code:

preprocess(DevType)
Ideal Output

What’s next?

To communicate with the audience better, we need to create a plot for our findings. Since our example features are developer types and our goal is to see their comparative job satisfaction, my method is to create bar plots to see if there is any thing interesting. The plot function is given under the preprocess function in the previous step.

plot(DevType)

Any interesting findings?

Not limited to DevType, if you are interested in finding out the factors people consider evaluating a job offer, simply see:

Imagine that you are deciding between two job offers, which 3 are MOST important to you?

The two colours representing whether people are satisfied about their current jobs or not. Therefore, by looking the plots, we are actually learning about the potential factors that may lead to a satisfying/dissatisfying job.

The factors eventually affecting job satisfaction may include education level, university major, factors evaluating a job offer, developer type, job hunting strategies, professional experience, or whether coding as a hobby.

Overall, the original dataset gives us a lot information about the active users on Stack Overflow. And I believe the dataset is a good starting point if you want to find insights about the Tech industry.

what drives you to look for a new job?

I hope you enjoyed this reading!

Thank you

--

--