- tl;dr
- Your most relevant questions about Python
- The world’s leading programming languages
- The numbers behind the Python ecosystem
- Growth trends of Python’s key domains
- Growth segments of the Python ecosystem
- Wrap-up
tl;dr
In this article we take Stackoverflow data from the past 6 months, we analyze vital aspects of the Python ecosystem and identify action points that you can address right now.
Please find here the summary of the actions from the article.
Industry trends where you should be active right now.
- If you start learning Python now, start with Python 3.x.
- Get involved with Tensorflow.
- Add Python tools to your data science toolkit. Start learning Pandas, Matplotlib and Numpy.
- Need a flexible micro-framework for your web APIs? Learn Flask.
Capabilities you should be actively developing.
- Machine learning skills with Python.
- Computer vision with OpenCV.
- Data collection through web scraping.
- Python based cloud solution options in performance intensive scenarios. APIs, scripting and automation.
Domains you should consider for future skill building.
- Experiment with Keras.
- Already a Django user? Check out the Django rest framework.
- Already a Python user? Consider a pip training to improve efficiency.
- Learn data manipulation basics; Excel, CSV and SQL.
New tech to explore.
- Watch the Fast.ai videos on Youtube.
- Have a look at Tensorflow lite
Your most relevant questions about Python
Our aim is to answer a couple of key questions about Python and its ecosystem with the use of real-life data coming from Stackoverflow. We all feel the buzz around Python lately and we might ask questions like the ones below.
- Shall I or someone in my team start to learn Python?
- Where shall I see the benefits of Python in my daily life/business?
- Shall I start with Python 2.x or 3.x?
- What is the hype around Python? I know I can do anything with it, but what makes it so special, what do others really build with it?
I compiled the article to answer your questions along practical lines. We look at the following angles.
- I show you the number of Stackoverflow questions and their growth trends over the past 6 months for the leading programming languages, so that you can compare the trends of Python, JavaScript, Java, C# and others.
- I introduce the key domains of Python. We’ll do this by looking at the Stackoverflow tags that are most often used with the
python
tag in Stackoverflow questions. - We look at the growth trends of the top domains we identified in the previous point and depict graphs to better visualize what’s going on.
- Finally, we explore relevant growth segments of the Python universe and show you what actions you can take to make the next step on your Python journey.
What makes this report unique?
- We combine two Stackoverflow data dumps from June 2018 and December 2018 and create growth figures of this period.
- The two dumps let you look at the growth in question views which is the best measure of real life usage.
- Special growth segments give you insights that we turn into an action plan right in the report.
Hope you are ready to explore the world of Python, let’s get started!
The world’s leading programming languages
According to Stackoverflow data, these are the world’s leading programming languages based on the total number of questions created since 2008.
# | Tag name | Questions |
---|---|---|
1 | javascript | 1723731 |
2 | java | 1487235 |
3 | c# | 1264947 |
4 | php | 1245650 |
5 | android | 1155005 |
6 | python | 1068679 |
7 | jquery | 936264 |
8 | html | 789778 |
9 | c++ | 595978 |
10 | ios | 584135 |
You can read a detailed report about 2019 tech trends based on Stackoverflow data in a previous article, where you’ll find more details about the high level numbers and trends.
The above table reflects the reality of the past decade where most players migrated to web applications mainly using JavaScript (including JQuery), PHP and HTML.
The table also illustrates the heavy use of Java and C# in enterprise projects, and the expansion of mobile technology with Android and iOS.
Good old C++ holds position 9, it’s still widely used in enterprise applications and let’s also not forget about game development.
Python takes the 6th place with its general purpose scripting capabilities, web frameworks and scientific and AI toolkit.
What makes Python special
Let me show you the growth in new questions created evey month as a percentage of total for the programming languages listed above.
This is the figure that puts Python into the focus of this article.
You find Python at the top this time. The growth of new Python questions created every moth clearly exceeds the average growth rate of its peer group. Moreover, Python overtakes Javascript for the first time in the history of Stackoverflow.
There is clearly something happening in the industry that makes projects turn towards Python’s capabilities in large numbers.
Let’s discover what’s behind the growing Python demand!
The numbers behind the Python ecosystem
Stackoverflow questions are usually tagged with multiple tags. This helps users find relevant questions coming from various tag based views.
Let’s look at the most common tags that are used together with the python
tag in Stackoverflow questions.
# | Tag name | Questions |
---|---|---|
1 | python | 1065497 |
2 | django | 94477 |
3 | python-3.x | 83044 |
4 | pandas | 76782 |
5 | python-2.7 | 56511 |
6 | numpy | 48520 |
7 | list | 34168 |
8 | matplotlib | 29592 |
9 | dictionary | 24332 |
10 | regex | 22212 |
11 | dataframe | 20465 |
12 | flask | 19775 |
13 | tkinter | 19333 |
14 | string | 18015 |
15 | tensorflow | 17713 |
16 | csv | 17615 |
17 | arrays | 17346 |
18 | json | 15367 |
19 | selenium | 12662 |
20 | html | 12348 |
21 | beautifulsoup | 11798 |
22 | google-app-engine | 11546 |
23 | mysql | 11509 |
24 | scipy | 11144 |
25 | opencv | 10872 |
Let’s see what these numbers tell you:
- There are over 1 million questions tagged
python
on Stackoverflow. Questions usually have multiple tags assigned to them, that’s why thepython
number is so much higher that the others. - The Django web framework has the top place amongst the niche tags, showing the popularity of Django in the past years.
- Python maintains two active versions; Python 2.x and Python 3.x. Both of them are used in production systems. Python 3.x was designed to be the “future of Python”. According to the numbers the transition is well on its way.
- Web application development with Django and Flask is a significant Python use-case.
- Data science, statistical computing with Pandas, Numpy, Matplotlib and further packages is another key area.
- Machine learning with Tensorflow and OpenCV is an additional important field with other solutions we’ll explore later.
- Web scraping with Beatifulsoup and Selenium is one more common Python scenario.
The above table shows the number of questions created by users. We have the data to look at the number of times questions with certain tags are looked at by users on Stackoverflow in the table below.
# | Tag name | Views |
---|---|---|
1 | python | 2462915693 |
2 | list | 142398873 |
3 | pandas | 142161227 |
4 | django | 139767711 |
5 | numpy | 117853743 |
6 | string | 116990138 |
7 | python-3.x | 114085122 |
8 | python-2.7 | 103820542 |
9 | matplotlib | 89430879 |
10 | dictionary | 79582148 |
11 | pip | 54533093 |
12 | dataframe | 46473150 |
13 | arrays | 43928691 |
14 | json | 39712693 |
15 | file | 38502362 |
16 | datetime | 37685124 |
17 | flask | 34957618 |
18 | windows | 33861450 |
19 | csv | 32177110 |
20 | regex | 31699530 |
21 | tkinter | 30903428 |
22 | scipy | 27193466 |
23 | unicode | 25776364 |
24 | linux | 24963227 |
25 | opencv | 23545548 |
View counts seem to be a great measure of real-life use, because once a question is asked, it is usually not asked again. The question remains available online so that anyone can look at it when they are looking for the solution provided in the question’s answers.
You can see subtle differences here.
- The
django
tag has the highest number of questions (after the main tagpython
) on the first table, but it is not the most viewed tag by users. Bothlist
andpandas
get more hits. - Also note the high number of Windows views compared to Linux viewers.
Having listed all major tags used together with the python
tag, it’s time we look at the trends of Python domains we just identified.
Growth trends of Python’s key domains
Let’s plot the growth trends of the key Python domains we identified in the previous section.
- New Python questions in total.
- Python 2.x vs Python 3.x.
- Web application development with Django and Flask.
- Data science and machine learning.
- Web scraping.
The graphs show the overall number of questions created per month for the given tag.
Please note that numbers in the previous section showed the number of questions tagged with a given tag where the question was also tagged with the python
tag. This time we are looking at the number of questions tagged with a given tag disregarding all other tags.
New Python questions
The creation of new Python questions has a stable upward slope; a clear sign of a buzzing ecosystem.
Python 2.7 vs 3.x
Python 3.x has a higher overall question creation number and the creation curve shows continuous growth. The number of new Python 2.7 questions seems to be over a turning point.
Django and Flask
Flask shows stable usage with slight growth, and Django is in steady slow decrease. These segments are not in high-growth.
Data science and machine learning
This is what a clear growth curve looks like. Pandas and dataframe
questions are clearly on the rise signaling that data science is a current hot topic.
The above is true for Tensorflow and machine learning in general, too.
Web scraping
With big data comes the big need for data. Web scraping is one way to collect information for data mining and machine learning projects.
Python has always been one of the go-to technologies in web scraping. (I personally choose a combination of PHP and JavaScript to run a web scraping based venture for several years, but today I would go with Python and a headless browser.)
Since Python is a popular way to process your data, it makes sense to collect your data with a Python based solution.
Here comes the part where you find individual go-to technologies and action-able insights in the world of Python.
Growth segments of the Python ecosystem
Let’s identify the high-growth domains of Python by looking at the growth in question views in the 6 month period between June 2018 and December 2018.
Let’s look at growth percentage in the following tables.
We examine Python segments based on the number of questions a tag has:
- Established technologies - tags with over 15k questions
- Emerging technologies- tags with 10k to 15k questions
- Trending technologies - tags with 5k to 10k questions
- Top newcomers - tags with less than 5k questions
Growth of established technologies
These are the heavy tags that have been around for long enough to have a stable user base and mature solutions. Users look at both old and new questions to find out how to solve daily challenges.
# | Tag name | Views Growth % |
---|---|---|
1 | tensorflow | 47.17 |
2 | dataframe | 41.02 |
3 | pandas | 37.27 |
4 | python-3.x | 28.28 |
5 | flask | 23.11 |
6 | csv | 22.98 |
7 | numpy | 22.88 |
8 | arrays | 21.88 |
9 | matplotlib | 21.47 |
10 | python-2.7 | 20.83 |
11 | json | 20.50 |
12 | tkinter | 19.28 |
13 | python | 18.45 |
14 | list | 18.42 |
15 | dictionary | 18.15 |
These are industry trends you should be active in right now.
- Tensorflow, DataFrame and Pandas are in high demand, experiencing around 40% growth in question views in 6 months.
- Besides the above tags we find
Numpy
,csv
,arrays
,Matplotlib
,list
anddictionary
from the data manipulation domain. - Flask question creation numbers were not too impressive, yet question views put Flask into the fifth place. Having
json
on this list tells me that using Python (and Flask) to build APIs is probably a strong use case. Tkinter
made it to the growth list, too. This is bit of a surprise to me, because I don’t know any client or project team who uses Tkinter to build stuff. I have some assumptions, but I prefer no to throw wild guesses into the article. Please leave a comment if you are using Tkinter in your projects and tell us what you’re building with it.
Actions you may take:
- If you start learning Python now, start with Python 3.x.
- Get involved with Tensorflow.
- Add Python tools to your data science toolkit. Start learning Pandas, Matplotlib and Numpy.
- Need a flexible Microframework for your web APIs? Learn Flask.
Growth of emerging technologies
These tags are also established with a strong base, although somewhat smaller than the ones in the previous chapter.
# | Tag name | Views Growth % |
---|---|---|
1 | machine-learning | 36.82 |
2 | opencv | 26.14 |
3 | selenium | 23.13 |
4 | beautifulsoup | 18.46 |
5 | sqlalchemy | 18.42 |
6 | scipy | 17.58 |
7 | html | 17.34 |
8 | mysql | 14.56 |
9 | multithreading | 14.39 |
10 | linux | 13.99 |
These are the capabilities you should be actively developing.
- Python based Machine learning is on top with both
machine-learning
andopencv
. Beatifulsoup
,selenium
andhtml
are all related to web scraping, although Selenium is a testing framework it is often used to scrape web sites especially with heavy JavaScript.SQLAlchemy
andmysql
show that Python based applications are primarily connected to SQL databases although NoSQL connectors are also available.- The rise of
multithreading
andLinux
suggests that Python is used in performance intensive applications in Cloud environments.
Actions you may take:
- Build machine learning skills with Python.
- Explore computer vision with OpenCV.
- Get started with data collection through web scraping.
- Consider Python based cloud solution options in performance intensive scenarios. Think APIs, scripting and automation.
Growth of trending technologies
These tags are from smaller niche segments.
# | Tag name | Views Growth % |
---|---|---|
1 | keras | 75.69 |
2 | scikit-learn | 34.05 |
3 | django-rest-framework | 26.23 |
4 | excel | 24.85 |
5 | python-requests | 24.61 |
6 | plot | 23.39 |
7 | pip | 22.89 |
8 | web-scraping | 22.83 |
9 | image | 21.18 |
10 | for-loop | 20.43 |
11 | datetime | 19.83 |
12 | loops | 18.49 |
13 | sql | 18.10 |
14 | postgresql | 18.01 |
15 | unit-testing | 17.59 |
These are the technologies to watch, and these are the domains you should consider for future skill building.
- Keras, the Python deep learning library, had a 75% growth in view numbers.
- Django rest framework confirms our idea that many projects use Python to build web APIs.
- The other tags are related to the domains we already mentioned in previous chapters. Please note how good old Excel made it to position 4. If you work with data, Excel is your friend.
Actions you may take:
- Experiment with Keras.
- Already a Django user? Check out the Django rest framework.
- Already Python user? Consider a pip training to improve efficiency.
- Learn data manipulation basics; Excel, CSV and SQL.
Top new comers
These tags appear out of thin air and may or may not become a rising star of the next period.
# | Tag name | Questions | Views Growth % |
---|---|---|---|
1 | fast-ai | 9 | 10083.33 |
2 | pipfile | 14 | 10013.33 |
3 | google-photos-api | 7 | 9576.92 |
4 | pyramid-arima | 7 | 6943.75 |
5 | ubuntu-18.04 | 55 | 4986.30 |
6 | generative-adversarial-network | 46 | 3812.15 |
7 | transfer-learning | 31 | 3052.17 |
8 | dask.distributed | 2 | 2875.76 |
9 | yattag | 2 | 2830.77 |
10 | tls1.0 | 1 | 2581.82 |
11 | debian-stretch | 7 | 2485.71 |
12 | tpc | 1 | 2431.91 |
13 | kubectl | 3 | 2305.56 |
14 | tensorflow-probability | 8 | 2145.00 |
15 | tensorflow-lite | 44 | 2034.17 |
Let’s highlight a few special points.
- Fast.ai is an amazing project, especially because it has exceptional learning materials on their web-site and on Youtube. If you are looking for a superb learning experience I highly recommend these resources.
- Pipfile aims to replace
requirements.txt
with a superior solution and give usPipfile
andPipfile.loc
. The project is in active development, no stable release yet, still it has over 2600 stars on GitHub. - Last, but not least, here we go with the next level of our Skynet utopia. Tensorflow lite runs machine learning models on mobile and embedded devices.
Actions you may take:
- Watch the Fast.ai videos on Youtube.
- Have a look at Tensorflow lite.
Wrap-up
I hope you enjoyed the steps as we discovered Stackoverflow data behind Python.
Python is not only practical and capable, but it’s also fun to work with and it’s easy to learn.
Let’s summarize the actions you may take again, so that you have them all in one place.
Industry trends where you should be active right now.
- If you start learning Python now, start with Python 3.x.
- Get involved with Tensorflow.
- Add Python tools to your data science toolkit. Start learning Pandas, Matplotlib and Numpy.
- Need a flexible micro-framework for your web APIs? Learn Flask.
Capabilities you should be actively developing.
- Machine learning skills with Python.
- Computer vision with OpenCV.
- Data collection through web scraping.
- Python based cloud solution options in performance intensive scenarios. APIs, scripting and automation.
Domains you should consider for future skill building.
- Experiment with Keras.
- Already a Django user? Check out the Django rest framework.
- Already a Python user? Consider a pip training to improve efficiency.
- Learn data manipulation basics; Excel, CSV and SQL.
New tech to explore.
- Watch the Fast.ai videos on Youtube.
- Have a look at Tensorflow lite.