의견

안녕 파이썬. 안녕하세요 Julia!

Python의 수명이 멈춤에 따라 새로운 경쟁자가 등장합니다.

5 월 1 일 · 8최소 읽기

Woman with hat covering her face in front of sunset — 줄리아가 여전히 당신에게 미스터리라면 걱정하지 마세요. ~의 사진줄리아 시저의 위에Unsplash

디오해하지 마세요. Python의 인기는 여전히 컴퓨터 과학자, 데이터 과학자 및 AI 전문가로 구성된 견고한 커뮤니티에 의해 뒷받침됩니다.

하지만이 사람들과 함께 저녁 식사를 해 본 적이 있다면 그들이 Python의 약점에 대해 얼마나 많이 불평하는지 알 것입니다. 느린 것부터 과도한 테스트가 필요한 것까지, 이전 테스트에도 불구하고 런타임 오류를 생성하는 것까지-화를 내기에 충분합니다.

그래서 점점 더 많은 프로그래머가 다른 언어를 채택하고 있습니다. 최고의 플레이어는 Julia, Go, Rust입니다. Julia는 수학적 및 기술적 작업에 적합하고 Go는 모듈 식 프로그램에 적합하며 Rust는 시스템 프로그래밍을위한 최고의 선택입니다.

데이터 과학자와 AI 전문가가 많은 수학적 문제를 다루기 때문에 Julia가 이들의 승자입니다. 그리고 비판적인 조사에도 불구하고 Julia는 Python이 이길 수없는 장점이 있습니다.

파이썬이 미래의 프로그래밍 언어가 아닌 이유

몇 년 동안 수요가 많을지라도

intodatascience.com

파이썬의 선과 줄리아의 탐욕

사람들은 새로운 프로그래밍 언어를 만들 때 이전 언어의 좋은 기능을 유지하고 나쁜 것은 수정하기를 원하기 때문에 그렇게합니다.

이런 의미에서 Guido van Rossum은 ABC를 개선하기 위해 1980 년대 후반에 Python을 만들었습니다. 후자는너무 완벽 해프로그래밍 언어의 경우-강성으로 인해 쉽게 가르 칠 수 있었지만 실제 생활에서는 사용하기 어려웠습니다.

대조적으로 Python은 매우 실용적입니다. 당신은 이것을에서 볼 수 있습니다파이썬의 선, 제작자의 의도를 반영합니다.

못생긴 것보다 아름다운 것이 낫습니다.
명시적인 것이 암시적인 것보다 낫습니다.
단순한 것이 복잡한 것보다 낫습니다.
복잡한 것이 복잡한 것보다 낫습니다.
플랫이 중첩보다 낫습니다.
스파 스는 조밀 한 것보다 낫습니다.
가독성이 중요합니다.
특별한 경우는 규칙을 어길만큼 특별하지 않습니다.
실용성이 순결을 능가하지만.
[...]

Python은 여전히 ABC의 좋은 기능인 가독성, 단순성, 초보자 친화적 인 기능을 유지했습니다. 그러나 Python은 ABC보다 훨씬 더 강력하고 실제 생활에 적합합니다.

ABC는 줄리아를위한 길을 닦고있는 파이썬을위한 길을 열었습니다. ~의 사진데이비드 발류의 위에Unsplash

같은 의미에서 Julia의 제작자는 다른 언어의 좋은 부분은 유지하고 나쁜 언어는 버리기를 원합니다. 그러나 Julia는 훨씬 더 야심적입니다. 하나의 언어를 대체하는 대신 모든 언어를 이기고 싶습니다.

이것이 방법입니다Julia의 제작자말해:

우리는 더 많은 것을 원합니다.우리는 자유 라이선스가있는 오픈 소스 언어를 원합니다. Ruby의 역동 성과 함께 C의 속도를 원합니다. Lisp와 같은 진정한 매크로를 사용하지만 Matlab과 같은 명확하고 친숙한 수학적 표기법을 사용하는 동음이의 언어를 원합니다. 우리는 Python처럼 일반 프로그래밍에 유용하고, R만큼 쉬운 통계, Perl만큼 자연스러운 문자열 처리, Matlab처럼 선형 대수에 대해 강력하고, 프로그램을 셸처럼 결합하는 데 능숙한 것을 원합니다. 배우기 쉽지만 가장 심각한 해커를 행복하게 만드는 것. 우리는 그것이 상호 작용하고 컴파일되기를 원합니다.

Julia는 현재 존재하는 모든 장점을 혼합하고 다른 언어의 단점과 교환하지 않기를 원합니다. Julia는 어린 언어이지만 이미 제작자가 설정 한 많은 목표를 달성했습니다.

Julia 개발자가 좋아하는 것

다재

Julia는 간단한 기계 학습 응용 프로그램에서 거대한 슈퍼 컴퓨터 시뮬레이션에 이르기까지 모든 것에 사용할 수 있습니다. 어느 정도까지는 파이썬도이 작업을 수행 할 수 있습니다. 그러나 파이썬은 어떻게 든 그 일로 성장했습니다.

반대로Julia가 지어졌습니다.정확히이 물건을 위해. 아래에서 위로.

속도

Julia의 제작자는 C만큼 빠른 언어를 만들고 싶었습니다.하지만 그들이 만든 것은더 빠르게. Python은 최근 몇 년 동안 속도를 높이기가 더 쉬워졌지만 성능은 여전히 Julia가 할 수있는 것과는 거리가 멀다.

2017 년 Julia는페타 플롭 클럽— 최고 성능에서 초당 1 페타 플롭의 속도를 초과 할 수있는 소규모 언어 클럽. Julia를 제외하고 C, C ++ 및 Fortran 만클럽에서지금.

Python 코드 속도를 높이는 10 가지 트릭

각 단계에서 작은 개선, 전체적으로 큰 도약

intodatascience.com

커뮤니티

30 년이 넘은 Python은 거대하고 지원적인 커뮤니티를 가지고 있습니다. 하나의 Google 검색으로 답을 얻을 수없는 Python 관련 질문은 거의 없습니다.

반대로 Julia 커뮤니티는 매우 작습니다. 이것은 답을 찾기 위해 조금 더 파야 할 수도 있지만 같은 사람들과 몇 번이고 연결될 수 있음을 의미합니다. 그리고 이것은 가치를 넘어서는 프로그래머-관계로 바뀔 수 있습니다.

코드 변환

Julia에서 코딩하기 위해 단일 Julia 명령을 알 필요도 없습니다. Julia 내에서 Python 및 C 코드를 사용할 수있을뿐만 아니라 당신은 심지어 사용할 수 있습니다Python 내의 Julia!

말할 필요도없이, 이것은 파이썬 코드의 약점을 아주 쉽게 패치 할 수있게합니다. 또는 Julia를 알아가는 동안 생산성을 유지합니다.

Image for post — 라이브러리는 여전히 Python의 강점입니다. ~의 사진수잔 인의 위에Unsplash

도서관

이것은 Python의 가장 강력한 포인트 중 하나입니다. 관리가 잘되는 수많은 라이브러리입니다. Julia는 라이브러리가 많지 않으며 사용자는 (아직) 놀라 울 정도로 관리되지 않는다고 불평했습니다.

그러나 Julia가 제한된 리소스를 가진 아주 어린 언어라고 생각할 때 이미 보유하고있는 라이브러리의 수는 매우 인상적입니다. Julia의 라이브러리 양이 증가하고 있다는 사실 외에도 예를 들어 C 및 Fortran의 라이브러리와 인터페이스하여 플롯을 처리 할 수 있습니다.

동적 및 정적 유형

Python은 100 % 동적으로 입력됩니다. 이것은 프로그램이 런타임에 변수가 실수인지 정수인지를 결정한다는 것을 의미합니다.

이것은 매우 초보자에게 친숙하지만 가능한 버그가 많이 발생합니다. 이는 가능한 모든 시나리오에서 Python 코드를 테스트해야 함을 의미합니다. 이는 많은 시간이 걸리는 매우 멍청한 작업입니다.

Julia 제작자도 배우기 쉽기를 원했기 때문에 Julia는 동적 타이핑을 완벽하게 지원합니다. 그러나 Python과 달리, 원하는 경우 정적 유형을 도입 할 수 있습니다. 예를 들어 C 또는 Fortran에있는 방식으로 제공됩니다.

이렇게하면 엄청난 시간을 절약 할 수 있습니다.테스트하지 않은 것에 대한 변명코드에서 의미가있는 곳에 유형을 지정할 수 있습니다.

줄리아가 파이썬보다 나은 5 가지 방법

Julia가 DS / ML에서 Python보다 나은 이유

intodatascience.com

데이터 : 작지만 투자

이 모든 것들이 꽤 훌륭하게 들리지만 Julia는 Python에 비해 여전히 작다는 것을 명심하는 것이 중요합니다.

꽤 좋은 지표 중 하나는 StackOverflow에 대한 질문 수입니다.이 시점에서 Python은 Julia보다 약 20 개 더 자주 태그가 지정됩니다!

이것은 Julia가 인기가 없다는 것을 의미하는 것이 아니라 프로그래머가 채택하는 데 자연스럽게 시간이 걸립니다.

생각해보십시오. 전체 코드를 다른 언어로 작성하고 싶습니까? 아니요, 차라리 향후 프로젝트에서 새로운 언어를 시도하고 싶습니다. 이로 인해 모든 프로그래밍 언어가 출시와 채택 사이에 직면하는 시간 지연이 발생합니다.

하지만 지금 채택한다면 (Julia는 엄청난 양의 언어 변환을 허용하기 때문에 쉽습니다.) 미래에 투자하는 것입니다. 점점 더 많은 사람들이 Julia를 채택함에 따라 이미 질문에 답할 수있는 충분한 경험을 쌓을 것입니다. 또한 점점 더 많은 Python 코드가 Julia로 대체됨에 따라 코드의 내구성이 향상됩니다.

Lots of ones and zeroes on screen, forming a red heart — Julia에게 사랑을 보여줄 시간입니다. ~의 사진알렉산더 신의 위에Unsplash

요점 : 줄리아를하고 그것이 당신의 우위가되게하십시오

40 년 전 인공 지능은 틈새 현상에 불과했습니다. 업계와 투자자들은 그것을 믿지 않았고 많은 기술이 투박하고 사용하기 어려웠습니다. 하지만 그 당시 배운 사람들은 오늘날의 거인입니다. 수요가 너무 많아서그들의 급여NFL 선수와 일치합니다.

마찬가지로 Julia는 지금도 여전히 매우 틈새 시장입니다. 그러나 그것이 커지면 큰 승자는 일찍 채택한 사람들이 될 것입니다.

지금 Julia를 입양하면 10 년 안에 엄청난 돈을 벌 수 있다는 말이 아닙니다. 하지만 기회가 늘어나고 있습니다.

생각해보십시오. 대부분의 프로그래머는 CV에 Python을 사용합니다. 그리고 앞으로 몇 년 안에 우리는 취업 시장에서 더 많은 Python 프로그래머를 보게 될 것입니다. 그러나 Python에 대한 기업의 수요가 느려지면 Python 프로그래머의 관점은 떨어질 것입니다. 처음에는 느리지 만 불가피합니다.

반면에 Julia를 이력서에 올릴 수 있다면 진정한 우위를 점할 수 있습니다. 솔직히 말해서 다른 Pythonista와 다른 점은 무엇입니까? 별로. 하지만 3 년 안에 줄리아 프로그래머는 그리 많지 않을 것입니다.

Julia-skills를 사용하면 직업 요구 사항 이상의 관심사가 있음을 보여줄뿐만 아니라 또한 배우고 자하는 열의와 프로그래머가된다는 것이 무엇을 의미하는지에 대해 더 넓은 이해를 갖고 있음을 보여줍니다. 즉, 당신은 그 일에 적합합니다.

여러분과 다른 Julia 프로그래머는 미래의 록 스타이며 여러분도 알고 있습니다. 또는Julia의 제작자2012 년에 이렇게 말했습니다.

우리가 용납 할 수없는 욕심이 많다는 것을 알고 있지만 우리는 여전히 모든 것을 갖고 싶어합니다. 약 2 년 반 전에 우리는 탐욕의 언어를 만들기 시작했습니다. 완전하지는 않지만 1.0 릴리즈를 할 때입니다. 우리가 만든 언어는줄리아. 그것은 이미 우리의 불의한 요구의 90 %를 제공하고 있으며, 이제는 그것을 더 구체화하기 위해 다른 사람들의 불의한 요구가 필요합니다. 그래서 만약 당신이 탐욕스럽고 비합리적이고 까다로운 프로그래머라면, 우리는 당신이 시도해보기를 바랍니다.

파이썬은 여전히 미친 듯이 인기가 있습니다. 그러나 지금 Julia를 배우면 나중에 황금 티켓이 될 수 있습니다. 이런 의미에서 : Bye-bye Python. 안녕하세요 Julia!

'Data Analytics(ko)' 카테고리의 다른 글

No More Basic Plots Please -번역 (0)	2020.10.05
The Definitive Data Scientist Environment Setup -번역 (0)	2020.10.03
Extracting Data from PDF File Using Python and R -번역 (0)	2020.10.02
Advanced Python: Itertools Library — The Gem Of Python Language -번역 (0)	2020.10.01
Data Visualisation using Pandas and Plotly -번역 (0)	2020.09.30

OPINION

Bye-bye Python. Hello Julia!

As Python’s lifetime grinds to a halt, a hot new competitor is emerging

Rhea Moutafis

Follow

May 1 · 8 min read

Don’t get me wrong. Python’s popularity is still backed by a rock-solid community of computer scientists, data scientists and AI specialists.

But if you’ve ever been at a dinner table with these people, you also know how much they rant about the weaknesses of Python. From being slow to requiring excessive testing, to producing runtime errors despite prior testing — there’s enough to be pissed off about.

Which is why more and more programmers are adopting other languages — the top players being Julia, Go, and Rust. Julia is great for mathematical and technical tasks, while Go is awesome for modular programs, and Rust is the top choice for systems programming.

Since data scientists and AI specialists deal with lots of mathematical problems, Julia is the winner for them. And even upon critical scrutiny, Julia has upsides that Python can’t beat.

Why Python is not the programming language of the future

Even though it will be in high demand for a few more years

towardsdatascience.com

The Zen of Python versus the Greed of Julia

When people create a new programming language, they do so because they want to keep the good features of old languages and fix the bad ones.

In this sense, Guido van Rossum created Python in the late 1980s to improve ABC. The latter was too perfect for a programming language — while its rigidity made it easy to teach, it was hard to use in real life.

In contrast, Python is quite pragmatic. You can see this in the Zen of Python, which reflects the intention that the creators have:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
[...]

Python still kept the good features of ABC: Readability, simplicity, and beginner-friendliness for example. But Python is far more robust and adapted to real life than ABC ever was.

ABC paved the way for Python, which is paving the way for Julia. Photo by David Ballew on Unsplash

In the same sense, the creators of Julia want to keep the good parts of other languages and ditch the bad ones. But Julia is a lot more ambitious: instead of replacing one language, it wants to beat them all.

This is how Julia’s creators say it:

We are greedy: we want more.We want a language that's open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

Julia wants to blend all upsides that currently exist, and not trade them off for the downsides in other languages. And even though Julia is a young language, it has already achieved a lot of the goals that the creators set.

What Julia developers are loving

Versatility

Julia can be used for everything from simple machine learning applications to enormous supercomputer simulations. To some extent, Python can do this, too — but Python somehow grew into the job.

In contrast, Julia was built precisely for this stuff. From the bottom up.

Speed

Julia’s creators wanted to make a language that is as fast as C — but what they created is even faster. Even though Python has become easier to speed up in recent years, its performance is still a far cry from what Julia can do.

In 2017, Julia even joined the Petaflop Club — the small club of languages who can exceed speeds of one petaflop per second at peak performance. Apart from Julia, only C, C++ and Fortran are in the club right now.

Ten Tricks To Speed Up Your Python Codes

Tiny improvement at each step, great leap as a whole

towardsdatascience.com

Community

With its more than 30 years of age, Python has an enormous and supportive community. There is hardly a Python-related question that you can’t get answered within one Google search.

In contrast, the Julia community is pretty tiny. While this means that you might need to dig a bit further to find an answer, you might link up with the same people again and again. And this can turn into programmer-relationships that are beyond value.

Code conversion

You don’t even need to know a single Julia-command to code in Julia. Not only can you use Python and C code within Julia. You can even use Julia within Python!

Needless to say, this makes it extremely easy to patch up the weaknesses of your Python code. Or to stay productive while you’re still getting to know Julia.

Libraries

This is one of the strongest points of Python — its zillion well-maintained libraries. Julia doesn’t have many libraries, and users have complained that they’re not amazingly maintained (yet).

But when you consider that Julia is a very young language with a limited amount of resources, the number of libraries that they already have is pretty impressive. Apart from the fact that Julia’s amount of libraries is growing, it can also interface with libraries from C and Fortran to handle plots, for example.

Dynamic and static types

Python is 100% dynamically typed. This means that the program decides at runtime whether a variable is a float or an integer, for example.

While this is extremely beginner-friendly, it also introduces a whole host of possible bugs. This means that you need to test Python code in all possible scenarios — which is quite a dumb task that takes a lot of time.

Since the Julia-creators also wanted it to be easy to learn, Julia fully supports dynamical typing. But in contrast to Python, you can introduce static types if you like — in the way they are present in C or Fortran, for example.

This can save you a ton of time: Instead of finding excuses for not testing your code, you can specify the type wherever it makes sense.

5 Ways Julia Is Better Than Python

Why Julia is better than Python for DS/ML

towardsdatascience.com

The data: Invest in things while they’re small

While all these things sound pretty great, it’s important to keep in mind that Julia is still tiny compared to Python.

One pretty good metric is the number of questions on StackOverflow: At this point in time, Python is tagged about twenty more often than Julia!

This doesn’t mean that Julia is unpopular — rather, it’s naturally taking some time to get adopted by programmers.

Think about it — would you really want to write your whole code in a different language? No, you’d rather try a new language in some future project. This creates a time lag that every programming language faces between its release and its adoption.

But if you adopt it now — which is easy because Julia allows an enormous amount of language conversion — you’re investing in the future. As more and more people adopt Julia, you’ll already have gained enough experience to answer their questions. Also, your code will be more durable as more and more Python code is replaced by Julia.

Bottom line: Do Julia and let it be your edge

Forty years ago, artificial intelligence was nothing but a niche phenomenon. The industry and investors didn’t believe in it, and many technologies were clunky and hard to use. But those who learned it back then are the giants of today — those that are so high in demand that their salary matches that of an NFL player.

Similarly, Julia is still very niche now. But when it grows, the big winners will be those who adopted it early.

I’m not saying that you’re guaranteed to make a shitload of money in ten years if you adopt Julia now. But you’re increasing your chances.

Think about it: Most programmers out there have Python on their CV. And in the next few years, we’ll see even more Python programmers on the job market. But if the demand of enterprises for Python slows, the perspectives for Python programmers are going to go down. Slowly at first, but inevitably.

On the other hand, you have a real edge if you can put Julia on your CV. Because let’s be honest, what distinguishes you from any other Pythonista out there? Not much. But there won’t be that many Julia-programmers out there, even in three years’ time.

With Julia-skills, not only are you showing that you have interests beyond the job requirements. You’re also demonstrating that you’re eager to learn and that you have a broader sense of what it means to be a programmer. In other words, you’re fit for the job.

You — and the other Julia programmers — are future rockstars, and you know it. Or, as Julia’s creators said it in 2012:

Even though we recognize that we are inexcusably greedy, we still want to have it all. About two and a half years ago, we set out to create the language of our greed. It's not complete, but it's time for a 1.0 release — the language we've created is called Julia. It already delivers on 90% of our ungracious demands, and now it needs the ungracious demands of others to shape it further. So, if you are also a greedy, unreasonable, demanding programmer, we want you to give it a try.

Python is still insanely popular. But if you learn Julia now, that could be your golden ticket later on. In this sense: Bye-bye Python. Hello Julia!

'Data Analytics(en)' 카테고리의 다른 글

Advanced Python: Itertools Library — The Gem Of Python Language (0)	2020.10.01
Data Visualisation using Pandas and Plotly (0)	2020.09.30
Python Lambda Expressions in Data Science (0)	2020.09.29
Launch of the New Jupyter Book (0)	2020.09.28
Bringing the best out of Jupyter Notebooks for Data Science (0)	2020.09.28

Python Lambda Expressions in Data Science

Upgrade your python coding standards to upgrade your research

S Ahmad

Sep 2 · 3 min read

Photo by Max Baskakov on Unsplash

Coding efficiently is one of the key premises to the use case of Python and Lambda expressions are no different. Python lambda’s are anonymous functions which involve small and concise syntax, whereas at times, regular functions can be too descriptive and quite long.

Python is one of a few languages which had lambda functions added to their syntax whereas other languages, like Haskell, uses lambda expressions as a core concept.

Whatever your use-case of a Lambda function, it’s really good to know what they’re about and how to use them.

Why Use Lambda Functions?

The true power of a lambda function can be shown when used inside another function but let’s start on the easy step.

Say you have a function definition that takes one argument, and that argument will be added to an unknown number:

def identity(x):
...     return x + 10

However this can be compressed into a simple one-liner as follows:

identity = lambda a : a + 10

This function can then be used as follows:

identity(10)

which will give the answer 20.

Now with this simple concept, we can also extend this to have more than one input as follows:

myfunc = lambda a, b, c : a + b + c

So the following:

myfunc(2,3,4)

Will give the function 9. It’s really that simple!

Now a really cool use case of Lambda expressions occurs when you use lambda functions within functions. Take the following example:

def myfunc(n):
   return lambda a : a * n

Here, the function myfunc returns a lambda function which multiplies the input a by a pre-defined integer, n. This allows the user to create functions on the fly:

mydoubler = myfunc(2)
mytripler = myfunc(3)

As can be seen, the function mydoubler is a function that simply defines an input by the number 2, whereas mytripler multiplies an input by 3. Test it out!

print(mydoubler(11))
print(mytripler(11))

This brings about the answers 22 and 33.

Photo by Ian Stauffer on Unsplash

Are Lambdas Pythonic or Not?

According to the style-guide of Python (PEP 8), it describes the following which actually recommends users TO NOT use Lambda expressions:

Always use a def statement instead of an assignment statement that binds a lambda expression directly to an identifier.
Yes:

def f(x): 
    return 2*x

No:
f = lambda x: 2*x

The logic around this is probably more to do with readability than any personal vendetta against lambda expressions. Agreeably, they can make it a bit more difficult to understand the use case but as a coder who prefers efficiency and simplicity in code, I do feel that there’s a place for them.

However, readable code has to be the most important feature of any code — debatably more important than efficiently run code.

Example Math Formulas

Mean:

mu = lambda x: sum(x) / len(x)

Variance:

variance = lambda x: sum((x - mu(x))**2) / (len(x) - 1)

Thanks for reading! If you have any messages, please let me know!

Keep up to date with my latest articles here!

'Data Analytics(en)' 카테고리의 다른 글

Data Visualisation using Pandas and Plotly (0)	2020.09.30
Bye-bye Python. Hello Julia! (0)	2020.09.29
Launch of the New Jupyter Book (0)	2020.09.28
Bringing the best out of Jupyter Notebooks for Data Science (0)	2020.09.28
Please Stop Doing These 5 Things in Pandas (0)	2020.09.27

The New Jupyter Book

Jupyter Book extends the notebook idea

Davide Camera

Aug 25 · 11 min read

2020–08–07 | On the Jupyter blog, Chris Holdgraf announces a rewrite of the Jupyter Book project.

“Jupyter Book is an open source project for building beautiful, publication-quality books, websites, and documents from source material that contains computational content. With this post, we’re happy to announce that Jupyter Book has been re-written from the ground up, making it easier to install, faster to use, and able to create more complex publishing content in your books. It is now supported by the Executable Book Project, an open community that builds open source tools for interactive and executable documents in the Jupyter ecosystem and beyond.”

What does the new Jupyter Book do?

The new version of Jupyter Book will feel very similar. However, it has a lot of new features due to the new Jupyter Book stack underneath (more on that later).

The new Jupyter Book has the following main features (with links to the relevant documentation for each):

✅ Write publication-quality content in markdown
You can write in either Jupyter markdown, or an extended flavor of markdown with publishing features. This includes support for rich syntax such as citations and cross-references, math and equations, and figures.

✅ Write content in Jupyter Notebooks
This allows you to include your code and outputs in your book. You can also write notebooks entirely in markdown to execute when you build your book.

✅ Execute and cache your book’s content
For .ipynb and markdown notebooks, execute code and insert the latest outputs into your book. In addition, cache and re-use outputs to be used later.

✅ Insert notebook outputs into your content
Generate outputs as you build your documentation, and insert them in-line with your content across pages.

✅ Add interactivity to your book
You can toggle cell visibility, include interactive outputs from Jupyter, and connect with online services like Binder.

✅ Generate a variety of outputs
This includes single- and multi-page websites, as well as PDF outputs.

✅ Build books with a simple command-line interface
You can quickly generate your books with one command, like so: jupyter-book build mybook/

These are just a few of the major changes that we’ve made. For a more complete idea of what you can do, check out the Jupyter Book documentation

An enhanced flavor of markdown

The biggest enhancement to Jupyter Book is support for the MyST Markdown language. MyST stands for “Markedly Structured Text”, and is a flavor of markdown that implements all of the features of the Sphinx documentation engine, allowing you to write scientific publications in markdown. It draws inspiration from RMarkdown and the reStructuredText ecosystem of tools. Anything you can do in Sphinx, you can do with MyST as well.

MyST Markdown is a superset of Jupyter Markdown (AKA, CommonMark), meaning that any default markdown in a Jupyter Notebook is valid in Jupyter Book. If you’d like extra features in markdown such as citations, figures, references, etc, then you may include extra MyST Markdown syntax in your content.

For example, here’s how you can include a citation in the new Jupyter Book:

A smarter build system

While the old version of Jupyter Book used a combination of Python and Jekyll to build your book’s HTML, the new Jupyter Book uses Python all the way through. This means that building the HTML for your book is as simple as:

jupyter-book build mybookname/

In addition, the new build system leverages Jupyter Cache to execute notebook content only if the code is updated, and to insert the outputs from the cache at build time. This saves you time by avoiding the need to re-execute code that hasn’t been changed.

More book output types

By leveraging Sphinx, Jupyter Book will be able to support more complex outputs than just an HTML website. For example, we are currently prototyping PDF Outputs, both via HTML as well as via LaTeX. This gives Jupyter Book more flexibility to generate the right book for your use case.

You can also run Jupyter Book on individual pages. This means that you can write single-page content (like a scientific article) entirely in Markdown.

A new stack

The biggest change under-the-hood is that Jupyter Book now uses the Sphinx documentation engine instead of Jekyll for building books. By leveraging the Sphinx ecosystem, Jupyter Book can more effectively build on top of community tools, and can contribute components back to the broader community.

Instead of being a single repository, the old Jupyter Book repository has now been separated into several modular tools. Each of these tools can be used on their own in your Sphinx documentation, and they can be coordinated together via Jupyter Book:

The MyST markdown parser for Sphinx allows you to write fully-featured Sphinx documentation in Markdown.
MyST-NB is an .ipynb parser for Sphinx that allows you to use MyST Markdown in your notebooks. It also provides tools for execution, cacheing, and variable insertion of Jupyter Notebooks in Sphinx.
The Sphinx Book Theme is a beautiful book-like theme for Sphinx, build on top of the PyData Sphinx Theme.
Jupyter Cache allows you to execute a collection of notebooks and store their outputs in a hashed database. This lets you cache your notebook’s output without including it in the .ipynb file itself.
Sphinx-Thebe converts your “static” HTML page into an interactive page with code cells that are run remotely by a Binder kernel.
Finally, Jupyter Book also supports a growing collection of Sphinx extensions, such as sphinx-copybutton, sphinx-togglebutton, sphinx-comments, and sphinx-panels.

What next?

Jupyter Book and its related projects will continue to be developed as a part of the Executable Book Project, a community that builds open source tools for high-quality scientific publications from computational content in the Jupyter ecosystem and beyond.

Photo by Markus Winkler on Unsplash

Overview and installation

Install the command-line interface

First off, make sure you have the CLI installed so that you can work with Jupyter Book. The Jupyter-Book CLI allows you to build and control your Jupyter Book. You can install it via pip with the following command:

pip install -U jupyter-book

The book building process

Building a Jupyter Book broadly consists of two steps:

Put your book content in a folder or a file. Jupyter Book needs the following pieces in order to build your book:

Your content file(s) (the pages of your book) in either markdown or Jupyter Notebooks.
A Table of Contents YAML file (_toc.yml) that defines the structure of your book. Mandatory when building a folder.
(optional) A configuration file (_config.yml) to control the behavior of Jupyter Book.

Build your book. Using Jupyter Book’s command-line interface you can convert your pages into either an HTML or a PDF book.

Host your book’s HTML online. Once your book’s HTML is built, you can host it online as a public website. See Publish your book online for more information.

Create a template Jupyter Book

We’ll use a small template book to show what kinds of files you might put inside your own. To create a new Jupyter Book, type the following at the command-line:

jupyter-book create mybookname

A new book will be created at the path that you’ve given (in this case, mybookname/).

If you would like to quickly generate a basic Table of Contents YAML file, run the following command:

jupyter-book toc mybookname/

And it will generate a TOC for you. Note that there must be at least one content file in each folder in order for any sub-folders to be parsed.

Inspecting your book’s contents

Let’s take a quick look at some important files in the demo book you created:

mybookname/
├── _config.yml
├── _toc.yml
├── content.md
├── intro.md
├── markdown.md
├── notebooks.ipynb
└── references.bib

Here’s a quick rundown of the files you can modify for yourself, and that ultimately make up your book.

Book configuration

All of the configuration for your book is in the following file:

mybookname/
├── _config.yml

You can define metadata for your book (such as its title), add a book logo, turn on different “interactive” buttons (such as a Binder button for pages built from a Jupyter Notebook), and more.

The top-most level of your TOC file are book chapters. Above, this is the “Features” page. Note that in this case the title of the page is not explicitly specified but is inferred from the source files. This behavior is controlled by the page_titles setting in _config.yml (see Files for more details). Each chapter can have several sections (defined in sections:) and each section can have several sub-sections. For more information about how section structure maps onto book structure, see How headers and sections map onto to book structure.

Each item in the _toc.yml file points to a single file. The links should be relative to your book’s folder and with no extension.

For example, in the example above there is a file in mybookname/content/notebooks.ipynb. The TOC entry that points to this file is here:

- file: features/notebooks

Book content

The markdown and ipynb files in your folder is your book’s content. Some content files for the demo book are shown below:

mybookname/
...
├── content.md
└── notebooks.ipynb

Note that the content files are either Jupyter Notebooks or Markdown files. These are the files that define “sections” in your book.

You can store these files in whatever collection of folders you’d like, note that the structure of your book when it is built will depend solely on the order of items in your _toc.yml file (see below section)

Book bibliography for citations

If you’d like to build a bibliography for your book, you can do so by including the following file:

mybookname/
└── references.bib

This BiBTex file can be used to insert citations into your book’s pages. For more information, see Citations and cross-references.

Next step: build your book

Now that you’ve got a Jupyter Book folder structure, we can create the HTML (or PDF) for each of your book’s pages.

Build your book

Once you’ve added content and configured your book, it’s time to build outputs for your book. We’ll use the jupyter-book build command-line tool for this.

Currently, there are two kinds of supported outputs: an HTML website for your book, and a PDF that contains all of the pages of your book that is built from the book HTML.

Prerequisites

In order to build the HTML for each page, you should have followed the steps in creating your Jupyter Book structure. You should have a collection of notebook/markdown files in your mybookname/ folder, a _toc.yml file that defines the structure of your book, and any configuration you’d like in the _config.yml file.

Build your book’s HTML

Now that your book’s content is in your book folder and you’ve defined your book’s structure in _toc.yml, you can build the HTML for your book.

Note: HTML is the default builder.

Do so by running the following command:

jupyter-book build mybookname/

This will generate a fully-functioning HTML site using a static site generator. The site will be placed in the _build/html folder. You can then open the pages in the site by entering that folder and opening the html files with your web browser.

Note: You can also use the short-hand jb for jupyter-book. E.g.,: jb build mybookname/.

Build a standalone page

Sometimes you’d like to build a single page of content rather than an entire book. For example, if you’d like to generate a web-friendly HTML page from a Jupyter Notebook for a report or publication.

You can generate a standalone HTML file for a single page of the Jupyter Book using the same command :

jupyter-book build path/to/mypage.ipynb

This will execute your content and output the proper HTML in a _build/html folder.

Your page will be called mypage.html. This will work for any content source file that is supported by Jupyter Book.

Note: Users should note that building single pages in the context of a larger project, can trigger warnings and incomplete links. For example, building docs/start/overview.md will issue a bunch of unknown document,term not in glossary, and undefined links warnings.

Page caching

By default, Jupyter Book will only build the HTML for pages that have been updated since the last time you built the book. This helps reduce the amount of unnecessary time needed to build your book. If you’d like to force Jupyter Book to re-build a particular page, you can either edit the corresponding file in your book’s folder, or delete that page’s HTML in the _build/html folder.

Local preview

To preview your book, you can open the generated HTML files in your browser. Either double-click the html file in your local folder, or enter the absolute path to the file in your browser navigation bar adding file:// at the beginning (e.g. file://Users/my_path_to_book/_build/index.html).

Next step: publish your book

Now that you’ve created the HTML for your book, it’s time to publish it online.

Publish your book online

Once you’ve built the HTML for your book, you can host it online. The best way to do this is with a service that hosts static websites (because that’s what you have just created with Jupyter Book). There are many options for doing this, and these sections cover some of the more popular ones.

Create an online repository for your book

Regardless of the approach you use for publishing your book online, it will require you to host your book’s content in an online repository such as GitHub. This section describes one approach you can use to create your own GitHub repository and add your book’s content to it.

First, log-in to GitHub, then go to the “create a new repository” page:https://github.com/new
Next, give your online repository a name and a description. Make your repository public and do not initialize with a README file, then click “Create repository”.
Now, clone the (currently empty) online repository to a location on your local computer. You can do this via the command line with:

git clone https://github.com/<my-org>/<my-repository-name>

4. Copy all of your book files and folders into this newly cloned repository. For example, if you created your book locally with jupyter-book create mylocalbook and your new repository is called myonlinebook, you could do this via the command line with:

cp -r mylocalbook/* myonlinebook/

5. Now you need to sync your local and remote (i.e., online) repositories. You can do this with the following commands:

cd myonlinebook
git add ./*
git commit -m "adding my first book!"
git push

Thanks so much for your interest in my post!

If it was useful for you, please remember to “Clap” 👏 it so other people can also benefit from it.

If you have any suggestions or questions, please leave a comment!

Bye-bye Python. Hello Julia! (0)	2020.09.29
Python Lambda Expressions in Data Science (0)	2020.09.29
Bringing the best out of Jupyter Notebooks for Data Science (0)	2020.09.28
Please Stop Doing These 5 Things in Pandas (0)	2020.09.27
Interactive spreadsheets in Jupyter (0)	2020.09.26

2020년 9월 30일 플레이스토어 게임 매출 순위 (0)	2020.09.30
2020년 9월 29일 플레이스토어 게임 매출 순위 (0)	2020.09.29
2020년 9월 27일 플레이스토어 게임 매출 순위 (0)	2020.09.27
2020년 9월 26일 플레이스토어 게임 매출 순위 (0)	2020.09.26
2020년 9월 25일 플레이스토어 게임 매출 순위 (0)	2020.09.25

Bringing the best out of Jupyter Notebooks for Data Science

Enhance Jupyter Notebook’s productivity with these Tips & Tricks.

Parul Pandey

Follow

Dec 19, 2018 · 9 min read

Reimagining what a Jupyter notebook can be and what can be done with it.

Netflix aims to provide personalized content to their 130 million viewers. One of the significant ways by which data scientists and engineers at Netflix interact with their data is through Jupyter notebooks. Notebooks leverage the use of collaborative, extensible, scalable, and reproducible data science. For many of us, Jupyter Notebooks is the de facto platform when it comes to quick prototyping and exploratory analysis. However, there’s more to this than meets the eye. A lot of Jupyter functionalities sometimes lies under the hood and is not adequately explored. Let us try and explore Jupyter Notebooks’ features which can enhance our productivity while working with them.

1. Executing Shell Commands

The notebook is the new shell

The shell is a way to interact textually with the computer. The most popular Unix shell is Bash(Bourne Again SHell ). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

Now, when we work with any Python interpreter, we need to regularly switch between the shell and the IDLE, in case we need to use the command line tools. However, the Jupyter Notebook gives us the ease to execute shell commands from within the notebook by placing an extra !before the commands. Any command that works at the command-line can be used in IPython by prefixing it with the ! character.

In [1]: !ls
example.jpeg list tmpIn [2]: !pwd
/home/Parul/Desktop/Hello World Folder'In [3]: !echo "Hello World"
Hello World

We can even pass values to and from the shell as follows:

In [4]: files= !lsIn [5]: print(files)
['example.jpeg', 'list', 'tmp']In [6]: directory = !pwdIn [7]: print(directory)
['/Users/Parul/Desktop/Hello World Folder']In [8]: type(directory)
IPython.utils.text.SList

Notice, the data type of the returned results is not a list.

2. Jupyter Themes

Theme-ify your Jupyter Notebooks!

If you are a person who gets bored while staring at the white background of the Jupyter notebook, themes are just for you. The themes also enhance the presentation of the code. You can find more about Jupyter themes here. Let’s get to the working part.

Installation

pip install jupyterthemes

List of available themes

jt -l

Currently, the available themes are chesterish, grade3, gruvboxd, gruvboxl monokai, oceans16, onedork, solarizedd ,solarizedl.

# selecting a particular themejt -t <name of the theme># reverting to original Themejt -r

You will have to reload the jupyter notebook everytime you change the theme, to see the effect take place.
The same commands can also be run from within the Jupyter Notebook by placing ‘!’ before the command.

Left: original | Middle: Chesterish Theme | Right: solarizedl theme

3. Notebook Extensions

Extend the possibilities

Notebook extensions let you move beyond the general vanilla way of using the Jupyter Notebooks. Notebook extensions (or nbextensions) are JavaScript modules that you can load on most of the views in your Notebook’s frontend. These extensions modify the user experience and interface.

Installation

Installation with conda:

conda install -c conda-forge jupyter_nbextensions_configurator

Or with pip:

pip install jupyter_contrib_nbextensions && jupyter contrib nbextension install#incase you get permission errors on MacOS,pip install jupyter_contrib_nbextensions && jupyter contrib nbextension install --user

Start a Jupyter notebook now, and you should be able to see an NBextensions Tab with a lot of options. Click the ones you want and see the magic happen.

In case you couldn’t find the tab, a second small nbextension, can be located under the menuEdit.

Let us discuss some of the useful extensions.

1. Hinterland

Hinterland enables code autocompletion menu for every keypress in a code cell, instead of only calling it with the tab. This makes Jupyter notebook’s autocompletion behave like other popular IDEs such as PyCharm.

2. Snippets

This extension adds a drop-down menu to the Notebook toolbar that allows easy insertion of code snippet cells into the current notebook.

3. Split Cells Notebook

This extension splits the cells of the notebook and places then adjacent to each other.

4. Table of Contents

This extension enables to collect all running headers and display them in a floating window, as a sidebar or with a navigation menu. The extension is also draggable, resizable, collapsible and dockable.

5. Collapsible Headings

Collapsible Headings allows the notebook to have collapsible sections, separated by headings. So in case you have a lot of dirty code in your notebook, you can simply collapse it to avoid scrolling it again and again.

6. Autopep8

Autopep8 helps to reformat/prettify the contents of code cells with just a click. If you are tired of hitting the spacebar again and again to format the code, autopep8 is your savior.

4. Jupyter Widgets

Make notebooks interactive

Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc. Widgets can be used to build interactive GUIs for the notebooks.

Installation

# pip
pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension# Conda
conda install -c conda-forge ipywidgets#Installing ipywidgets with conda automatically enables the extension

Let us have a look at some of the widgets. For complete details, you can visit their Github repository.

Interact

The interact function (ipywidgets.interact) automatically creates a user interface (UI) controls for exploring code and data interactively. It is the easiest way to get started using IPython's widgets.

# Start with some imports!from ipywidgets import interact
import ipywidgets as widgets

1. Basic Widgets

def f(x):
    return x# Generate a slider 
interact(f, x=10,);

# Booleans generate check-boxes
interact(f, x=True);

# Strings generate text areas
interact(f, x='Hi there!');

2. Advanced Widgets

Here is a list of some of the useful advanced widgets.

Play Widget

The Play widget is useful to perform animations by iterating on a sequence of integers at a certain speed. The value of the slider below is linked to the player.

play = widgets.Play(
    # interval=10,
    value=50,
    min=0,
    max=100,
    step=1,
    description="Press play",
    disabled=False
)
slider = widgets.IntSlider()
widgets.jslink((play, 'value'), (slider, 'value'))
widgets.HBox([play, slider])

Date picker

The date picker widget works in Chrome and IE Edge but does not currently work in Firefox or Safari because they do not support the HTML date input field.

widgets.DatePicker(
    description='Pick a Date',
    disabled=False
)

Color picker

widgets.ColorPicker(
    concise=False,
    description='Pick a color',
    value='blue',
    disabled=False
)

Tabs

tab_contents = ['P0', 'P1', 'P2', 'P3', 'P4']
children = [widgets.Text(description=name) for name in tab_contents]
tab = widgets.Tab()
tab.children = children
for i in range(len(children)):
    tab.set_title(i, str(i))
tab

5. Qgrid

Make Data frames intuitive

Qgrid is also a Jupyter notebook widget but mainly focussed at dataframes. It uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting and filtering controls, as well as edit your DataFrames by double-clicking cells. The Github Repository contains more details and examples.

Installation

Installing with pip:

pip install qgrid
jupyter nbextension enable --py --sys-prefix qgrid# only required if you have not enabled the ipywidgets nbextension yet
jupyter nbextension enable --py --sys-prefix widgetsnbextension

Installing with conda:

# only required if you have not added conda-forge to your channels yet
conda config --add channels conda-forgeconda install qgrid

6. Slideshow

Code is great when communicated.

Notebooks are an effective tool for teaching and writing explainable codes. However, when we want to present our work either we display our entire notebook(with all the codes) or we take the help of powerpoint. Not any more. Jupyter Notebooks can be easily converted to slides and we can easily choose what to show and what to hide from the notebooks.

There are two ways to convert the notebooks into slides:

1. Jupyter Notebook’s built-in Slide option

Open a new notebook and navigate to View → Cell Toolbar → Slideshow. A light grey bar appears on top of each cell, and you can customize the slides.

Now go to the directory where the notebook is present and enter the following code:

jupyter nbconvert *.ipynb --to slides --post serve
# insert your notebook name instead of *.ipynb

The slides get displayed at port 8000. Also, a .html file will be generated in the directory, and you can also access the slides from there.

This would look even more classy with a themed background. Let us apply the theme ’onedork’ to the notebook and then convert it into a slideshow.

These slides have a drawback i.e. you can see the code but cannot edit it. RISE plugin offers a solution.

2. Using the RISE plugin

RISE is an acronym for Reveal.js — Jupyter/IPython Slideshow Extension. It utilized the reveal.js to run the slideshow. This is super useful since it also gives the ability to run the code without having to exit the slideshow.

Installation

1 — Using conda (recommended):

conda install -c damianavila82 rise

2 — Using pip (less recommended):

pip install RISE

and then two more steps to install the JS and CSS in the proper places:

jupyter-nbextension install rise --py --sys-prefix#enable the nbextension:
jupyter-nbextension enable rise --py --sys-prefix

Let us now use RISE for the interactive slideshow. We shall re-open the Jupyter Notebook we created earlier. Now we notice a new extension that says “Enter/Exit RISE Slideshow.”

Click on it, and you are good to go. Welcome to the world of interactive slides.

Refer to the documentation for more information.

6. Embedding URLs, PDFs, and Youtube Videos

Display it right there!

Why go with mere links when you can easily embed an URL, pdf, and videos into your Jupyter Notebooks using IPython’s display module.

URLs

#Note that http urls will not be displayed. Only https are allowed inside the Iframefrom IPython.display import IFrame
IFrame('https://en.wikipedia.org/wiki/HTTPS', width=800, height=450)

PDFs

from IPython.display import IFrame
IFrame('https://arxiv.org/pdf/1406.2661.pdf', width=800, height=450)

Youtube Videos

from IPython.display import YouTubeVideoYouTubeVideo('mJeNghZXtMo', width=800, height=300)

Conclusion

These were some of the features of the Jupyter Notebooks that I found useful and worth sharing. Some of them would be obvious to you while some may be new. So, go ahead and experiment with them. Hopefully, they will be able to save you some time and give you a better UI experience. Also feel free to suggest other useful features in the comments.

'Data Analytics(en)' 카테고리의 다른 글

Python Lambda Expressions in Data Science (0)	2020.09.29
Launch of the New Jupyter Book (0)	2020.09.28
Please Stop Doing These 5 Things in Pandas (0)	2020.09.27
Interactive spreadsheets in Jupyter (0)	2020.09.26
Pandas DataFrame (Python): 10 useful tricks (0)	2020.09.25

Rank	Game	Publisher
1	리니지M	NCSOFT
2	리니지2M	NCSOFT
3	바람의나라: 연	NEXON Company
4	R2M	Webzen Inc.
5	기적의 검	4399 KOREA
6	뮤 아크엔젤	Webzen Inc.
7	KartRider Rush+	NEXON Company
8	V4	NEXON Company
9	블레이드&소울 레볼루션	Netmarble
10	라그나로크 오리진	GRAVITY Co., Ltd.
11	가디언 테일즈	Kakao Games Corp.
12	라이즈 오브 킹덤즈	LilithGames
13	일루전 커넥트	ChangYou
14	리니지2 레볼루션	Netmarble
15	Epic Seven	Smilegate Megaport
16	A3: 스틸얼라이브	Netmarble
17	그랑삼국	YOUZU(SINGAPORE)PTE.LTD.
18	AFK 아레나	LilithGames
19	FIFA ONLINE 4 M by EA SPORTS™	NEXON Company
20	스테리테일	4399 KOREA
21	슬램덩크	DeNA HONG KONG LIMITED
22	FIFA Mobile	NEXON Company
23	PUBG MOBILE	PUBG CORPORATION
24	Lords Mobile: Kingdom Wars	IGG.COM
25	동방불패 모바일	Perfect World Korea
26	메이플스토리M	NEXON Company
27	Roblox	Roblox Corporation
28	마구마구 2020	Netmarble
29	왕좌의게임:윈터이즈커밍	YOOZOO GAMES KOREA CO., LTD.
30	Age of Z Origins	Camel Games Limited
31	Gardenscapes	Playrix
32	Rise of Empires: Ice and Fire	Long Tech Network Limited
33	검은사막 모바일	PEARL ABYSS
34	Pmang Poker : Casino Royal	NEOWIZ corp
35	Empires & Puzzles: Epic Match 3	Small Giant Games
36	한게임 포커	NHN BIGFOOT
37	Summoners War	Com2uS
38	황제라 칭하라	Clicktouch Co., Ltd.
39	Brawl Stars	Supercell
40	Homescapes	Playrix
41	에오스 레드	BluePotion Games
42	Random Dice: PvP Defense	111%
43	Teamfight Tactics: League of Legends Strategy Game	Riot Games, Inc
44	일곱 개의 대죄: GRAND CROSS	Netmarble
45	Lord of Heroes	CloverGames
46	케페우스M	Ujoy Games
47	Last Shelter: Survival	Long Tech Network Limited
48	카이로스 : 어둠을 밝히는 자	Longtu Korea Inc.
49	컴투스프로야구2020	Com2uS
50	궁3D	WISH INTERACTIVE TECHNOLOGY LIMITED

'Game Rank' 카테고리의 다른 글

2020년 9월 29일 플레이스토어 게임 매출 순위 (0)	2020.09.29
2020년 9월 28일 플레이스토어 게임 매출 순위 (0)	2020.09.28
2020년 9월 26일 플레이스토어 게임 매출 순위 (0)	2020.09.26
2020년 9월 25일 플레이스토어 게임 매출 순위 (0)	2020.09.25
2020년 9월 24일 플레이스토어 게임 매출 순위 (0)	2020.09.24

Please Stop Doing These 5 Things in Pandas

These mistakes are super common and super easy to fix.

Preston Badeer

Follow

Feb 23 · 5 min read

As someone who did over a decade of development before moving into Data Science, there’s a lot of mistakes I see data scientists make while using Pandas. The good news is these are really easy to avoid, and fixing them can also make your code more readable.

Mistake 1: Getting or Setting Values Slowly

It’s nobody’s fault that there are way too many ways to get and set values in Pandas. In some situations, you have to find a value using only an index or find the index using only the value. However, in many cases, you’ll have many different ways of selecting data at your disposal: index, value, label, etc.

In those situations, I prefer to use whatever is fastest. Here are some common choices from slowest to fastest, which shows you could be missing out on a 195% gain!

Tests were run using a DataFrame of 20,000 rows. Here’s the notebook if you want to run it yourself.

# .at - 22.3 seconds
for i in range(df_size):
    df.at[i] = profile
Wall time: 22.3 s# .iloc - 15% faster than .at
for i in range(df_size):
    df.iloc[i] = profile
Wall time: 19.1 s# .loc - 30% faster than .at
for i in range(df_size):
    df.loc[i] = profile
Wall time: 16.5 s# .iat, doesn't work for replacing multiple columns of data.
# Fast but isn't comparable since I'm only replacing one column.
for i in range(df_size):
    df.iloc[i].iat[0] = profile['address']
Wall time: 3.46 s# .values / .to_numpy() - 195% faster than .at
for i in range(df_size):
    df.values[i] = profile
    # Recommend using to_numpy() instead if you have Pandas 1.0+
    # df.to_numpy()[i] = profile
Wall time: 254 ms

(As Alex Bruening and miraculixx noted in the comments, for loops are not the ideal way to perform actions like this, look at .apply(). I’m using them here purely to prove the speed difference of the line inside the loop.)

Mistake 2: Only Using 25% of Your CPU

Whether you’re on a server or just your laptop, the vast majority of people never use all the computing power they have. Most processors (CPUs) have 4 cores nowadays, and by default, Pandas will only ever use one.

From the Modin Docs, a 4x speedup on a 4 core machine.

Modin is a Python module built to enhance Pandas by making way better use of your hardware. Modin DataFrames don’t require any extra code and in most cases will speed up everything you do to DataFrames by 3x or more.

Modin acts as more of a plugin than a library since it uses Pandas as a fallback and cannot be used on its own.

The goal of Modin is to augment Pandas quietly and let you keep working without learning a new library. The only line of code most people will need is import modin.pandas as pd replacing your normal import pandas as pd, but if you want to learn more check out the documentation here.

In order to avoid recreating tests that have already been done, I’ve included this picture from the Modin documentation showing how much it can speed up the read_csv() function on a standard laptop.

Please note that Modin is in development, and while I use it in production, you should expect some bugs. Check the Issues in GitHub and the Supported APIs for more information.

Mistake 3: Making Pandas Guess Data Types

When you import data into a DataFrame and don’t specifically tell Pandas the columns and datatypes, Pandas will read the entire dataset into memory just to figure out the data types.

For example, if you have a column full of text Pandas will read every value, see that they’re all strings, and set the data type to “string” for that column. Then it repeats this process for all your other columns.

You can use df.info() to see how much memory a DataFrame uses, that’s roughly the same amount of memory Pandas will consume just to figure out the data types of each column.

Unless you’re tossing around tiny datasets or your columns are changing constantly, you should always specify the data types. In order to do this, just add the dtypes parameter and a dictionary with your column names and their data types as strings. For example:

pd.read_csv(‘fake_profiles.csv’, dtype={
    ‘job’: ‘str’,
    ‘company’: ‘str’,
    ‘ssn’: ‘str’
})

Note: This also applies to DataFrames that don’t come from CSVs.

Mistake 4: Leftover DataFrames

One of the best qualities of DataFrames is how easy they are to create and change. The unfortunate side effect of this is most people end up with code like this:

# Change dataframe 1 and save it into a new dataframedf1 = pd.read_csv(‘file.csv’)df2 = df1.dropna()df3 = df2.groupby(‘thing’)

What happens is you leave df2 and df1 in Python memory, even though you’ve moved on to df3. Don’t leave extra DataFrames sitting around in memory, if you’re using a laptop it’s hurting the performance of almost everything you do. If you’re on a server, it’s hurting the performance of everyone else on that server (or at some point, you’ll get an “out of memory” error).

Instead, here are some easy ways to keep your memory clean:

Use df.info() to see how much memory a DataFrame is using
Install plugin support in Jupyter, then install the Variable Inspector plugin for Jupyter. If you’re used to having a variable inspector in R-Studio, you should know that R-Studio now supports Python!
If you’re in a Jupyter session already, you can always erase variables without restarting by using del df2
Chain together multiple DataFrame modifications in one line (so long as it doesn’t make your code unreadable): df = df.apply(thing1).dropna()
As Roberto Bruno Martins pointed out, another way to ensure clean memory is to perform operations within functions. You can still unintentionally abuse memory this way, and explaining scope is outside the scope of this article, but if you aren’t familiar I’d encourage you to read this writeup.

Mistake 5: Manually Configuring Matplotlib

This might be the most common mistake, but it lands at #5 because it’s the least impactful. I see this mistake happen even in tutorials and blog posts from experienced professionals.

Matplotlib is automatically imported by Pandas, and it even sets some chart configuration up for you on every DataFrame.

There’s no need to import and configure it for every chart when it’s already baked into Pandas for you.

Here’s an example of doing it the wrong way, even though this is a basic chart it’s still a waste of code:

import matplotlib.pyplot as plt
ax.hist(x=df[‘x’])
ax.set_xlabel(‘label for column X’)
plt.show()

And here’s the right way:

df[‘x’].plot()

Easier, right? You can do anything on these DataFrame plot objects that you can do to any other Matplotlib plot object. For example:

df[‘x’].plot.hist(title=’Chart title’)

I’m sure I’m making other mistakes I don’t know about, but hopefully sharing these known ones with you will help put your hardware to better use, let you write less code, and get more done!

If you’re still looking for more optimizations, you’ll definitely want to read:

3 Insane Secret Weapons for Python

I don’t know how I lived without them

towardsdatascience.com

'Data Analytics(en)' 카테고리의 다른 글

Launch of the New Jupyter Book (0)	2020.09.28
Bringing the best out of Jupyter Notebooks for Data Science (0)	2020.09.28
Interactive spreadsheets in Jupyter (0)	2020.09.26
Pandas DataFrame (Python): 10 useful tricks (0)	2020.09.25
Introducing Bamboolib — a GUI for Pandas (0)	2020.09.25

Rank	Game	Publisher
1	리니지M	NCSOFT
2	리니지2M	NCSOFT
3	바람의나라: 연	NEXON Company
4	R2M	Webzen Inc.
5	기적의 검	4399 KOREA
6	뮤 아크엔젤	Webzen Inc.
7	KartRider Rush+	NEXON Company
8	V4	NEXON Company
9	일루전 커넥트	ChangYou
10	라그나로크 오리진	GRAVITY Co., Ltd.
11	라이즈 오브 킹덤즈	LilithGames
12	블레이드&소울 레볼루션	Netmarble
13	FIFA ONLINE 4 M by EA SPORTS™	NEXON Company
14	그랑삼국	YOUZU(SINGAPORE)PTE.LTD.
15	Epic Seven	Smilegate Megaport
16	A3: 스틸얼라이브	Netmarble
17	AFK 아레나	LilithGames
18	리니지2 레볼루션	Netmarble
19	메이플스토리M	NEXON Company
20	스테리테일	4399 KOREA
21	PUBG MOBILE	PUBG CORPORATION
22	동방불패 모바일	Perfect World Korea
23	슬램덩크	DeNA HONG KONG LIMITED
24	Lords Mobile: Kingdom Wars	IGG.COM
25	가디언 테일즈	Kakao Games Corp.
26	Roblox	Roblox Corporation
27	Teamfight Tactics: League of Legends Strategy Game	Riot Games, Inc
28	Gardenscapes	Playrix
29	왕좌의게임:윈터이즈커밍	YOOZOO GAMES KOREA CO., LTD.
30	Pmang Poker : Casino Royal	NEOWIZ corp
31	Brawl Stars	Supercell
32	마구마구 2020	Netmarble
33	Age of Z Origins	Camel Games Limited
34	Rise of Empires: Ice and Fire	Long Tech Network Limited
35	검은사막 모바일	PEARL ABYSS
36	Empires & Puzzles: Epic Match 3	Small Giant Games
37	한게임 포커	NHN BIGFOOT
38	Summoners War	Com2uS
39	FIFA Mobile	NEXON Company
40	황제라 칭하라	Clicktouch Co., Ltd.
41	페이트/그랜드 오더	Netmarble
42	안녕엘라	(주)알피지리퍼블릭
43	케페우스M	Ujoy Games
44	Homescapes	Playrix
45	Random Dice: PvP Defense	111%
46	궁3D	WISH INTERACTIVE TECHNOLOGY LIMITED
47	컴투스프로야구2020	Com2uS
48	Lord of Heroes	CloverGames
49	Last Shelter: Survival	Long Tech Network Limited
50	Cookie Run: OvenBreak - Endless Running Platformer	Devsisters Corporation

'Game Rank' 카테고리의 다른 글

2020년 9월 29일 플레이스토어 게임 매출 순위 (0)	2020.09.29
2020년 9월 28일 플레이스토어 게임 매출 순위 (0)	2020.09.28
2020년 9월 27일 플레이스토어 게임 매출 순위 (0)	2020.09.27
2020년 9월 25일 플레이스토어 게임 매출 순위 (0)	2020.09.25
2020년 9월 24일 플레이스토어 게임 매출 순위 (0)	2020.09.24

Interactive spreadsheets in Jupyter

Martin Renou

Follow

Mar 8, 2019 · 4 min read

ipywidgets plays an essential part in the Jupyter ecosystem; it brings interactivity between user and data.

Widgets are eventful Python objects that often have a visual representation in the Jupyter Notebook or JupyterLab: a button, a slider, a text input, a checkbox…

More than a library of interactive widgets, ipywidgets is a powerful framework upon which it is straightforward to create new custom widgets. Developers can quickly start their own widgets library with best practices of code structure and packaging using the widget-cookiecutter project.

You can find examples of really nice widgets libraries in the blog-post: Video streaming in the Jupyter Notebook.

A spreadsheet is an interactive tool for data analysis in a tabular form. It consists of cells and cell ranges. It supports value dependent cell formatting/styling and one can apply mathematical functions on cells and perform chained computations. It is the perfect user interface for statistical and financial operations.

The Jupyter Notebook was lacking a spreadsheet library, that’s when ipysheet comes into play.

ipysheet

ipysheet is a new interactive widgets library that aims at implementing the core features of a good spreadsheet application and more.

There are two main widgets in ipysheet, the Cell widget, and the Sheet widget. We provide helper functions for creating rows, columns and cell ranges in general.

The cell value can be a boolean, a numerical value, a string, a date, and of course another widget!

ipysheet uses a Matplotlib-like API for creating a sheet:

The user can create entire rows, columns, and even cell ranges:

Of course, values in cells are dynamic, the cell value can be dynamically updated from Python and the new value will be visible in the sheet.

It is possible to link a cell value to a widget (in the following screenshot a FloatSlider widget is linked to cell “a”) and to define a specific cell as the result of a custom calculation depending on other cells:

Custom styling can be used, using what we call renderers:

Adding support to NumPy Arrays and Pandas Dataframes loading and exporting was an important feature that we wanted. ipysheet provides from_array, to_array, from_dataframe and to_dataframe functions for this purpose:

Another killer feature is that a cell value can be ANY interactive widget. This means that the user can put a button or a slider widget in a cell:

But it also means that a higher level widget can be put in a cell. Whether the widget is a plot from bqplot, a map from ipyleaflet or even a multi-volume rendering from ipyvolume:

You can try it right now with binder, without the need of installing anything on your computer, just by clicking on this button:

The source code is hosted on Github: https://github.com/QuantStack/ipysheet/

Similar projects

ipyaggrid is a widgets library for importing/editing/exporting Pandas Dataframes: Harnessing the power of ag-Grid in Jupyter
qgrid is an interactive grid for sorting, filtering, and editing Pandas Dataframes in Jupyter notebooks.

Acknowledgments

The development of ipysheet is led by QuantStack.

This development is sponsored by Société Générale and Bloomberg.

About the Authors

Maarten Breddels is an entrepreneur and freelance developer / consultant / data scientist working mostly with Python, C++ and Javascript in the Jupyter ecosystem. Founder of vaex.io. His expertise ranges from fast numerical computation, API design, to 3d visualization. He has a Bachelor in ICT, a Master and PhD in Astronomy, likes to code and solve problems.

Martin Renou is a Scientific Software Engineer at QuantStack. Before joining QuantStack, he studied at the French Aerospace Engineering School SUPAERO. He also worked at Logilab in Paris and Enthought in Cambridge. As an open source developer at QuantStack, Martin worked on a variety of projects, from xsimd, xtensor, xframe, xeus and xeus-python in C++ to ipyleaflet and ipywebrtc in Python and JavaScript.

'Data Analytics(en)' 카테고리의 다른 글

Bringing the best out of Jupyter Notebooks for Data Science (0)	2020.09.28
Please Stop Doing These 5 Things in Pandas (0)	2020.09.27
Pandas DataFrame (Python): 10 useful tricks (0)	2020.09.25
Introducing Bamboolib — a GUI for Pandas (0)	2020.09.25
Jupyter is now a full-fledged IDE (0)	2020.09.25

전체 글

의견

안녕 파이썬. 안녕하세요 Julia!

Python의 수명이 멈춤에 따라 새로운 경쟁자가 등장합니다.

파이썬이 미래의 프로그래밍 언어가 아닌 이유

몇 년 동안 수요가 많을지라도

intodatascience.com

파이썬의 선과 줄리아의 탐욕

Julia 개발자가 좋아하는 것

다재

속도

Python 코드 속도를 높이는 10 가지 트릭

각 단계에서 작은 개선, 전체적으로 큰 도약

intodatascience.com

커뮤니티

코드 변환

도서관

동적 및 정적 유형

줄리아가 파이썬보다 나은 5 가지 방법

Julia가 DS / ML에서 Python보다 나은 이유

intodatascience.com

데이터 : 작지만 투자

요점 : 줄리아를하고 그것이 당신의 우위가되게하십시오

'Data Analytics(ko)' 카테고리의 다른 글

OPINION

Bye-bye Python. Hello Julia!

As Python’s lifetime grinds to a halt, a hot new competitor is emerging

Why Python is not the programming language of the future

Even though it will be in high demand for a few more years

towardsdatascience.com

The Zen of Python versus the Greed of Julia

What Julia developers are loving

Versatility

Speed

Ten Tricks To Speed Up Your Python Codes

Tiny improvement at each step, great leap as a whole

towardsdatascience.com

Community

Code conversion

Libraries

Dynamic and static types

5 Ways Julia Is Better Than Python

Why Julia is better than Python for DS/ML

towardsdatascience.com

The data: Invest in things while they’re small

Bottom line: Do Julia and let it be your edge

'Data Analytics(en)' 카테고리의 다른 글

Python Lambda Expressions in Data Science

Upgrade your python coding standards to upgrade your research

Why Use Lambda Functions?

Are Lambdas Pythonic or Not?

Example Math Formulas

'Data Analytics(en)' 카테고리의 다른 글

The New Jupyter Book

Jupyter Book extends the notebook idea

What does the new Jupyter Book do?

An enhanced flavor of markdown

A smarter build system

More book output types

A new stack

What next?

Overview and installation

Install the command-line interface

The book building process

Create a template Jupyter Book

Inspecting your book’s contents

Book configuration

Table of Contents

Book content

Book bibliography for citations

Next step: build your book

Build your book

Prerequisites

Build your book’s HTML

Build a standalone page

Page caching

Local preview

Next step: publish your book

Publish your book online

Create an online repository for your book