Top 14 Python skills for businesses | 2019 | Based on Stackoverflow Data

Visit the shop

tl;dr

In this article we take Stackoverflow data from the past 6 months, we analyze vital aspects of the Python ecosystem and identify action points that you can address right now.

Please find here the summary of the actions from the article.

Industry trends where you should be active right now.

  • If you start learning Python now, start with Python 3.x.
  • Get involved with Tensorflow.
  • Add Python tools to your data science toolkit. Start learning Pandas, Matplotlib and Numpy.
  • Need a flexible micro-framework for your web APIs? Learn Flask.

Capabilities you should be actively developing.

  • Machine learning skills with Python.
  • Computer vision with OpenCV.
  • Data collection through web scraping.
  • Python based cloud solution options in performance intensive scenarios. APIs, scripting and automation.

Domains you should consider for future skill building.

  • Experiment with Keras.
  • Already a Django user? Check out the Django rest framework.
  • Already a Python user? Consider a pip training to improve efficiency.
  • Learn data manipulation basics; Excel, CSV and SQL.

New tech to explore.

Your most relevant questions about Python

Our aim is to answer a couple of key questions about Python and its ecosystem with the use of real-life data coming from Stackoverflow. We all feel the buzz around Python lately and we might ask questions like the ones below.

  • Shall I or someone in my team start to learn Python?
  • Where shall I see the benefits of Python in my daily life/business?
  • Shall I start with Python 2.x or 3.x?
  • What is the hype around Python? I know I can do anything with it, but what makes it so special, what do others really build with it?

I compiled the article to answer your questions along practical lines. We look at the following angles.

  • I show you the number of Stackoverflow questions and their growth trends over the past 6 months for the leading programming languages, so that you can compare the trends of Python, JavaScript, Java, C# and others.
  • I introduce the key domains of Python. We’ll do this by looking at the Stackoverflow tags that are most often used with the python tag in Stackoverflow questions.
  • We look at the growth trends of the top domains we identified in the previous point and depict graphs to better visualize what’s going on.
  • Finally, we explore relevant growth segments of the Python universe and show you what actions you can take to make the next step on your Python journey.

What makes this report unique?

  • We combine two Stackoverflow data dumps from June 2018 and December 2018 and create growth figures of this period.
  • The two dumps let you look at the growth in question views which is the best measure of real life usage.
  • Special growth segments give you insights that we turn into an action plan right in the report.

Hope you are ready to explore the world of Python, let’s get started!

The world’s leading programming languages

According to Stackoverflow data, these are the world’s leading programming languages based on the total number of questions created since 2008.

# Tag name Questions
1 javascript 1723731
2 java 1487235
3 c# 1264947
4 php 1245650
5 android 1155005
6 python 1068679
7 jquery 936264
8 html 789778
9 c++ 595978
10 ios 584135

You can read a detailed report about 2019 tech trends based on Stackoverflow data in a previous article, where you’ll find more details about the high level numbers and trends.

The above table reflects the reality of the past decade where most players migrated to web applications mainly using JavaScript (including JQuery), PHP and HTML.

The table also illustrates the heavy use of Java and C# in enterprise projects, and the expansion of mobile technology with Android and iOS.

Good old C++ holds position 9, it’s still widely used in enterprise applications and let’s also not forget about game development.

Python takes the 6th place with its general purpose scripting capabilities, web frameworks and scientific and AI toolkit.

What makes Python special

Let me show you the growth in new questions created evey month as a percentage of total for the programming languages listed above.

img

This is the figure that puts Python into the focus of this article.

You find Python at the top this time. The growth of new Python questions created every moth clearly exceeds the average growth rate of its peer group. Moreover, Python overtakes Javascript for the first time in the history of Stackoverflow.

There is clearly something happening in the industry that makes projects turn towards Python’s capabilities in large numbers.

Let’s discover what’s behind the growing Python demand!

The numbers behind the Python ecosystem

Stackoverflow questions are usually tagged with multiple tags. This helps users find relevant questions coming from various tag based views.

Let’s look at the most common tags that are used together with the python tag in Stackoverflow questions.

# Tag name Questions
1 python 1065497
2 django 94477
3 python-3.x 83044
4 pandas 76782
5 python-2.7 56511
6 numpy 48520
7 list 34168
8 matplotlib 29592
9 dictionary 24332
10 regex 22212
11 dataframe 20465
12 flask 19775
13 tkinter 19333
14 string 18015
15 tensorflow 17713
16 csv 17615
17 arrays 17346
18 json 15367
19 selenium 12662
20 html 12348
21 beautifulsoup 11798
22 google-app-engine 11546
23 mysql 11509
24 scipy 11144
25 opencv 10872

Let’s see what these numbers tell you:

  • There are over 1 million questions tagged python on Stackoverflow. Questions usually have multiple tags assigned to them, that’s why the python number is so much higher that the others.
  • The Django web framework has the top place amongst the niche tags, showing the popularity of Django in the past years.
  • Python maintains two active versions; Python 2.x and Python 3.x. Both of them are used in production systems. Python 3.x was designed to be the “future of Python”. According to the numbers the transition is well on its way.
  • Web application development with Django and Flask is a significant Python use-case.
  • Data science, statistical computing with Pandas, Numpy, Matplotlib and further packages is another key area.
  • Machine learning with Tensorflow and OpenCV is an additional important field with other solutions we’ll explore later.
  • Web scraping with Beatifulsoup and Selenium is one more common Python scenario.

The above table shows the number of questions created by users. We have the data to look at the number of times questions with certain tags are looked at by users on Stackoverflow in the table below.

# Tag name Views
1 python 2462915693
2 list 142398873
3 pandas 142161227
4 django 139767711
5 numpy 117853743
6 string 116990138
7 python-3.x 114085122
8 python-2.7 103820542
9 matplotlib 89430879
10 dictionary 79582148
11 pip 54533093
12 dataframe 46473150
13 arrays 43928691
14 json 39712693
15 file 38502362
16 datetime 37685124
17 flask 34957618
18 windows 33861450
19 csv 32177110
20 regex 31699530
21 tkinter 30903428
22 scipy 27193466
23 unicode 25776364
24 linux 24963227
25 opencv 23545548

View counts seem to be a great measure of real-life use, because once a question is asked, it is usually not asked again. The question remains available online so that anyone can look at it when they are looking for the solution provided in the question’s answers.

You can see subtle differences here.

  • The django tag has the highest number of questions (after the main tag python) on the first table, but it is not the most viewed tag by users. Both list and pandas get more hits.
  • Also note the high number of Windows views compared to Linux viewers.

Having listed all major tags used together with the python tag, it’s time we look at the trends of Python domains we just identified.

Let’s plot the growth trends of the key Python domains we identified in the previous section.

  • New Python questions in total.
  • Python 2.x vs Python 3.x.
  • Web application development with Django and Flask.
  • Data science and machine learning.
  • Web scraping.

The graphs show the overall number of questions created per month for the given tag.

Please note that numbers in the previous section showed the number of questions tagged with a given tag where the question was also tagged with the python tag. This time we are looking at the number of questions tagged with a given tag disregarding all other tags.

New Python questions

The creation of new Python questions has a stable upward slope; a clear sign of a buzzing ecosystem.

img

Python 2.7 vs 3.x

Python 3.x has a higher overall question creation number and the creation curve shows continuous growth. The number of new Python 2.7 questions seems to be over a turning point.

img

Django and Flask

Flask shows stable usage with slight growth, and Django is in steady slow decrease. These segments are not in high-growth.

img

Data science and machine learning

This is what a clear growth curve looks like. Pandas and dataframe questions are clearly on the rise signaling that data science is a current hot topic.

The above is true for Tensorflow and machine learning in general, too.

img

Web scraping

With big data comes the big need for data. Web scraping is one way to collect information for data mining and machine learning projects.

Python has always been one of the go-to technologies in web scraping. (I personally choose a combination of PHP and JavaScript to run a web scraping based venture for several years, but today I would go with Python and a headless browser.)

Since Python is a popular way to process your data, it makes sense to collect your data with a Python based solution.

img

Here comes the part where you find individual go-to technologies and action-able insights in the world of Python.

Growth segments of the Python ecosystem

Let’s identify the high-growth domains of Python by looking at the growth in question views in the 6 month period between June 2018 and December 2018.

Let’s look at growth percentage in the following tables.

We examine Python segments based on the number of questions a tag has:

  • Established technologies - tags with over 15k questions
  • Emerging technologies- tags with 10k to 15k questions
  • Trending technologies - tags with 5k to 10k questions
  • Top newcomers - tags with less than 5k questions

Growth of established technologies

These are the heavy tags that have been around for long enough to have a stable user base and mature solutions. Users look at both old and new questions to find out how to solve daily challenges.

# Tag name Views Growth %
1 tensorflow 47.17
2 dataframe 41.02
3 pandas 37.27
4 python-3.x 28.28
5 flask 23.11
6 csv 22.98
7 numpy 22.88
8 arrays 21.88
9 matplotlib 21.47
10 python-2.7 20.83
11 json 20.50
12 tkinter 19.28
13 python 18.45
14 list 18.42
15 dictionary 18.15

These are industry trends you should be active in right now.

  • Tensorflow, DataFrame and Pandas are in high demand, experiencing around 40% growth in question views in 6 months.
  • Besides the above tags we find Numpy, csv, arrays, Matplotlib, list and dictionary from the data manipulation domain.
  • Flask question creation numbers were not too impressive, yet question views put Flask into the fifth place. Having json on this list tells me that using Python (and Flask) to build APIs is probably a strong use case.
  • Tkinter made it to the growth list, too. This is bit of a surprise to me, because I don’t know any client or project team who uses Tkinter to build stuff. I have some assumptions, but I prefer no to throw wild guesses into the article. Please leave a comment if you are using Tkinter in your projects and tell us what you’re building with it.

Actions you may take:

  • If you start learning Python now, start with Python 3.x.
  • Get involved with Tensorflow.
  • Add Python tools to your data science toolkit. Start learning Pandas, Matplotlib and Numpy.
  • Need a flexible Microframework for your web APIs? Learn Flask.

Growth of emerging technologies

These tags are also established with a strong base, although somewhat smaller than the ones in the previous chapter.

# Tag name Views Growth %
1 machine-learning 36.82
2 opencv 26.14
3 selenium 23.13
4 beautifulsoup 18.46
5 sqlalchemy 18.42
6 scipy 17.58
7 html 17.34
8 mysql 14.56
9 multithreading 14.39
10 linux 13.99

These are the capabilities you should be actively developing.

  • Python based Machine learning is on top with both machine-learning and opencv.
  • Beatifulsoup, selenium and html are all related to web scraping, although Selenium is a testing framework it is often used to scrape web sites especially with heavy JavaScript.
  • SQLAlchemy and mysql show that Python based applications are primarily connected to SQL databases although NoSQL connectors are also available.
  • The rise of multithreading and Linux suggests that Python is used in performance intensive applications in Cloud environments.

Actions you may take:

  • Build machine learning skills with Python.
  • Explore computer vision with OpenCV.
  • Get started with data collection through web scraping.
  • Consider Python based cloud solution options in performance intensive scenarios. Think APIs, scripting and automation.

These tags are from smaller niche segments.

# Tag name Views Growth %
1 keras 75.69
2 scikit-learn 34.05
3 django-rest-framework 26.23
4 excel 24.85
5 python-requests 24.61
6 plot 23.39
7 pip 22.89
8 web-scraping 22.83
9 image 21.18
10 for-loop 20.43
11 datetime 19.83
12 loops 18.49
13 sql 18.10
14 postgresql 18.01
15 unit-testing 17.59

These are the technologies to watch, and these are the domains you should consider for future skill building.

  • Keras, the Python deep learning library, had a 75% growth in view numbers.
  • Django rest framework confirms our idea that many projects use Python to build web APIs.
  • The other tags are related to the domains we already mentioned in previous chapters. Please note how good old Excel made it to position 4. If you work with data, Excel is your friend.

Actions you may take:

  • Experiment with Keras.
  • Already a Django user? Check out the Django rest framework.
  • Already Python user? Consider a pip training to improve efficiency.
  • Learn data manipulation basics; Excel, CSV and SQL.

Top new comers

These tags appear out of thin air and may or may not become a rising star of the next period.

# Tag name Questions Views Growth %
1 fast-ai 9 10083.33
2 pipfile 14 10013.33
3 google-photos-api 7 9576.92
4 pyramid-arima 7 6943.75
5 ubuntu-18.04 55 4986.30
6 generative-adversarial-network 46 3812.15
7 transfer-learning 31 3052.17
8 dask.distributed 2 2875.76
9 yattag 2 2830.77
10 tls1.0 1 2581.82
11 debian-stretch 7 2485.71
12 tpc 1 2431.91
13 kubectl 3 2305.56
14 tensorflow-probability 8 2145.00
15 tensorflow-lite 44 2034.17

Let’s highlight a few special points.

  • Fast.ai is an amazing project, especially because it has exceptional learning materials on their web-site and on Youtube. If you are looking for a superb learning experience I highly recommend these resources.
  • Pipfile aims to replace requirements.txt with a superior solution and give us Pipfile and Pipfile.loc. The project is in active development, no stable release yet, still it has over 2600 stars on GitHub.
  • Last, but not least, here we go with the next level of our Skynet utopia. Tensorflow lite runs machine learning models on mobile and embedded devices.

Actions you may take:

Wrap-up

I hope you enjoyed the steps as we discovered Stackoverflow data behind Python.

Python is not only practical and capable, but it’s also fun to work with and it’s easy to learn.

Let’s summarize the actions you may take again, so that you have them all in one place.

Industry trends where you should be active right now.

  • If you start learning Python now, start with Python 3.x.
  • Get involved with Tensorflow.
  • Add Python tools to your data science toolkit. Start learning Pandas, Matplotlib and Numpy.
  • Need a flexible micro-framework for your web APIs? Learn Flask.

Capabilities you should be actively developing.

  • Machine learning skills with Python.
  • Computer vision with OpenCV.
  • Data collection through web scraping.
  • Python based cloud solution options in performance intensive scenarios. APIs, scripting and automation.

Domains you should consider for future skill building.

  • Experiment with Keras.
  • Already a Django user? Check out the Django rest framework.
  • Already a Python user? Consider a pip training to improve efficiency.
  • Learn data manipulation basics; Excel, CSV and SQL.

New tech to explore.