Sponsored content
Comment from Tom Miller, Northwestern University Faculty Director MSDS program.
Many years ago, as an applied statistics student at the University of Minnesota, I learned about programming in academia. At the beginning of the course, the professor said:
“I don't care what language you use in your assignments, as long as you do your job.”
Although I had some experience with Fortran, I was trying to teach myself Pascal and adopt a structured programming style.
Taking my professor's word for it, I programmed my first assignment in Pascal while my classmates used Fortran. The first mission is due. I take my thesis (program list) to the front of the room and hand it to the professor. He looks at it suspiciously and asks, “What is this?”
I will explain. “Pascal. I told you we could program in any language we wanted, as long as we did the work ourselves.”
In response, the professor said, “It's Pascal. I don't read Pascal. I only read Fortran.''
Lesson: Academics are not particularly open to new programming languages.
fortran
Fortran was developed by John Backus at IBM and introduced in 1957. When you hear its name, you think of “formula conversion.” Fortran is suitable for numerical computations required in scientific and engineering applications. Fortran has recently experienced a resurgence, perhaps due to large datasets and the computational demands of supercomputing.
pascal
Pascal is a derivative of ALGOL, designed by Swiss computer scientist Nicholas Wirth and introduced in 1970. Pascal was aligned with the movement toward structured programming at many universities during the 1970s and his 1980s. Variations of Pascal are used by Apple and Microsoft for system programming.
Data science students at most universities today would have a similar experience if they submitted an assignment in Go, Rust, or another modern language rather than Python or R.
In machine learning applications and AI, Python rules the day. Data scientists may be content to sail the Python boat with life preservers like Numpy, Pandas, Scikit-learn, and TensorFlow by their side.
But please be careful. Today's ocean of data is choppy. A shark approaches.
Remember what Chief Brody says to Quint in the movie? jaws: “I need a bigger ship.” I suggest building a bigger and faster ship in Go.
Go (Golan)
Go was developed by three Google computer scientists: Robert Griesemer, Rob Pike, and Ken Thompson. It's easier and safer to work with than C, while retaining the performance benefits of C. Go was introduced in 2009 and is Google's primary systems programming language. Go is replacing C/C++, C#, Java, and Python in mission-critical systems in many organizations. Go is sometimes referred to as “Golang” to distinguish it from the Go board game and to provide a more reliable term in search engines.
Data Science Careers: Why Go?
Carmen Andor traced the development of computer languages from 1980 to 2017 in a presentation titled “Why Go?” She made a compelling argument for using Go in large-scale programming projects. Her assertion is still true today.
- Go is machine efficient. This beats not only interpreted languages, but also languages that rely on virtual machines.
- Python arrived on the computer scene more than 30 years ago, before multi-core processors were commonplace. Python is a single-threaded, interpreted language, making it poorly suited for systems that require concurrency.
- Data scientists may be writing in Python, but for computationally intensive tasks, C or C++ does the work. Python is just the “glue” that holds the parts of the machine learning boat together.
- It doesn't take long to find benchmark examples that show Go's advantages over Python and R, the leading languages in data science.
Go, sometimes referred to as the “C” of the 21st century, is a strongly typed language that compiles directly to machine code. It compiles much faster than C and runs about as fast as C.
C, C++, and C#
C was developed by Dennis Ritchie at Bell Laboratories and introduced in 1972. C has been a popular systems programming language for many years because it provides low-level access to memory and is easy to map to machine instructions. C has performance advantages over most other programming languages. C++ and C# provide object-oriented extensions to C while maintaining the structure and performance benefits of C.
Concurrency (no small task) is an essential feature of Go
Go provides a rich set of tools to take advantage of today's multicore digital computers. Data science requires languages and systems that can meet the demands of today's data-driven and data-intensive world. Data science requires Go.
Go is efficient for programmers. Python is often touted as being easy to learn. However, I would argue that Go is easier to learn than Python. Go is a simple language by design, with only 25 keywords. Go is easy to read, easy to use, and easy to maintain over time.
Let's be happy that Go community leaders are reluctant to add new features. Donald Knuth was right. When version 3.14 of TeX was reached, he declared that there would be no new versions of the language or new features, only bug fixes. And each time we fixed a bug, we borrowed a different number from π (pi).
A Go programmer's credo: “Keep it simple. Keep it running.”
Go has a well-defined structure with formatting utilities to ensure a common style (sometimes referred to as “idiomatic Go”) among programmers. Go has automatic memory management (garbage collection) that protects programmers from memory leaks and errors. Go is safer than C or C++.
Go's core developers value backward compatibility, and Go's module system promotes safety and ensures that each build includes the correct packages at compile time. Go tracks software versions as your software stack grows.
Think of software development as a game of Jenga. I would like to access the block at the bottom of the stack while preventing the entire stack from collapsing. Let me do this in Go.
Go simplifies your software stack. What about your software stack and infrastructure?
If Python (even powered by C or C++) is not up to the task, data scientists will turn to other languages and systems. Below are some so-called solutions to Python's performance problems.
To implement high-performance solutions, data scientists turn to Spark, which is built on Scala, which relies on the Java Virtual Machine. And for easy access, these well-intentioned data scientists are adding PySpark to the mix. Is this the best way to deal with Python performance issues?
Consider a simpler software stack. That's Go, just Go:
Using code examples from the 2021 and 2023 GopherCon conferences, Daniel Whitenack shows you how to implement machine learning and artificial intelligence solutions in Go. Go allows you to build integrated and intelligent web applications, including applications that call generative AI and large-scale language models.
Go is a typical systems programming language for today's multicore digital computers. Go is the language of the cloud. Go is a distributed computing language. Data scientists who previously viewed Python as a “glue language” can now view Go as “superglue.”
Go is widely used in industry. Businesses value Go's security, simplicity, and performance. They also recognize Go's strengths as a backend systems programming environment. Go is suitable for developing web servers, database servers, application programming interfaces, and microservices. Go is suitable for implementing scalable, high-performance systems.
Many companies, including Go's birthplace, Google, use it for large-scale, mission-critical systems. If Go is good enough for Google, Netflix, Uber, Dropbox, PayPal, American Express, Capital One, Salesforce, Zillow, and many others, it's good enough for the rest of us.
Because Go can provide an effective platform for building Docker, Kubernetes, Prometheus, Grafana, Pachyderm, Terraform, CrowdStrike, etcd, CockroachDB, Weaviate, milvus, Aerospike, and a variety of other distributed systems and cloud-native microservices. If so, Go is an effective platform for building data science applications.
Computer science and data science educators should learn from industry. I need to add Go to the course. This is what we do at Northwestern.
Three languages for data science at Northwestern
Using Go for data science doesn't mean you have to give up the great features that R and Python offer. We can be multilingual.
It's not hard to imagine a project where data scientists explore data in R, develop models in Python, and implement systems in Go. Of his three languages for data science, Go is the newest. Go is on the rise and offers many job opportunities.
Northwestern's Data Science program values the strength of three languages in data science across the program's specializations.
- R includes numerous packages for analysis and modeling and is highly valued by applied statisticians. Ideal for scientific programming and applied research. R is especially suited for data exploration and visualization. R is the primary language used in most courses in the Analysis and Modeling specialization at Northwestern University.
- Python is currently the most popular computer language for data science. It is particularly good at natural language processing and serves as a primary client for deep learning platforms. Python provides a rich environment for model development and is the primary language for most courses in the Artificial Intelligence specialization at Northwestern University.
- Go is a systems programming language designed for today's multiprocessor computers. Ideal for implementing scalable, high-performance systems for data science, such as web applications and database servers. As shown on his Learning Go for Data Science website, Go is the primary language for Northwestern's data engineering specialization.
Students in Northwestern University's online data science master's degree program develop the critical analytical and leadership skills needed to analyze and interpret data to make informed and impactful decisions in a wide range of fields. Build. Classes are taught by seasoned faculty who are industry experts. Students can choose from either the general data science track or one of her five specializations: Analytics and Modeling, Analytics Management, Artificial Intelligence, Data Engineering, and Technology Entrepreneurship, in areas of interest. Develop your expertise. Students study part-time, completely online and at their own pace. Applications are accepted quarterly.