R 2 Python

r2py
A post documenting my journey to learn python for data science courses.
Author

Isaac Quintanilla Salinas

Published

December 21, 2024

My Programming Journey

I first learned about programming, when I did a Biostatistics internship where we analyzed data using SAS1. This was the first time where I sent commands to a computer to conduct a task, and I thoroughly enjoyed doing this. However, when my internship was over, I no longer had access to SAS at my own institution and could not learn more about Statistics or programming because of a huge financial barrier. This is where I learned about R and it being free and open-source software. This was a game changer! I was able to learn more about statistics and programming because the ultimate barrier was removed. Yes, it was completely different from SAS, but I can learn, and no one can stop me.

After CSUMB, I went on to get my Master’s in Public Health2 at San Diego State University where I expanded my R skills and really got to learn how to use RStudio. Afterwards, I started my PhD in Applied Statistics at UC Riverside where I refined my programming skills even further. Additionally, I learned how to program in c++, well at least via Rcpp, to efficiently complete my dissertation. As you can see, my programming journey primarily focused on langauges that helped my Statistics journey, it being R.

As an Assistant Professor of Statistics at CSUCI where we are planning to launch a new BS in Data Science program this upcoming Fall 2025. As part of the program, we are planning to teach an Introduction to Data Science course with python, in addition to several other courses that will be taught in R or SQL. Therefore, there is a need for me to learn python and SQL to properly train my students.

As I am learning python, I am not really planning to switch to it. I will still use R for all my work. In honesty, I would rather get better at c++ via rcpp, learn rust, learn julia, or truthfully get better at vim via neovim. However, I understand that python is a popular language for data scientists, and it would be a disservice to my students if I do not learn and properly train them.

How will I learn?

Every time I began to work with python, I always dreaded it. I think because I was coming from R as a Statistician, I was always so upset on how difficult it is to begin working in python. In R, getting started is super easy: install R from CRAN, install RStudio form Posit, and then install R packages and there we go!. In python, it is install Anacanda, use jupyter or spyder, and try to do data analysis. And so I do it for a few days, then I need to run another program, most likely R, and learn that it will no longer run due to changes in R. I later learned that Anaconda changed my system, and I would have to delete it get back to my work. I do not know what is going on with Anaconda, but changing R in my system without telling me is unacceptable3. This leads me back to the R community for solutions.

Learning python, the R way!

Reading a few blog posts online, many R users learning python recommended to just simply install python, install VS Code, and have VS code point to the python version you would like to work with. Afterwards install the python libraries4, the same way as you would install R packages. Fortunately, Posit is developing their next generation Data Science IDE, called Positron, designed for both R and python users. What I appreciate the most is that they have instructions to install python via pyenv. As of right now, it has allowed me to learn python without changing my system. Additionally, I am able to learn positron, which is not bad at all. My only gripe is that I can not move the console to the top-right as I have it in my RStudio.

Intro to Data Science

The main mechanism I am using to learn python is by converting Data Science in a Box to a python-based course instead of an R-based course. There are several reasons to do it this way. First, I understand what is going on in the course and can read the R code, so I can translate to python and know when I am making a mistake. Second, I have not found python versions of this course5; therefore, it will force me to really learn python. Third, the course is very user-friendly for our students. The topics they provide are fascinating, and I believe it is more suitable to my students. Fourth, there license is very permissable, CC BY-SA 4.0 for the win! Fifth, other faculty members in my department have expressed interest in teaching a Data Science course with this material. Therefore, for all these reasons, and many more, I will be learning python, well the basics at least, by converting Data Science in a Box from R to python.

While I am translating/designing this course, there will be some deviations from the standard data science course. Most of decisions I will make is based on what is pedagogically more feasible for my students. For example, I will be using plotnine, the ggplot2 of python, to create graphics since the syntax is exactly the same as in R. Another one is using the polars library, an alternate version of pandas, for data frames due to the simple nature data manipulation6. These design changes are meant to show students what is possible without scarying them away from the field. I do not know what other changes I will make, but I am positive there will be more.

Lastly, as I am part of the Southern California Consortium for Data Science, I will try to incorporate their recommendations in the course.

Conclusion

As I continue with building this Data Science Course, I will try to write blog posts writing my experiences, giving out tips, and possibly tutorials. I will also be writing about my experience with positron and a few other tools, possibly quarto7, and much more.

Disclaimer

While I do want these posts to highlight my raw experiences, they will be edited from time to time to ensure that R programmers will properly learn how to use python from a statisticians perspective. However, at the end of each post, there will be a reflection that will not change from the original post.

Resources

Here are some resources that I found useful. I will be updating it as I learn more about python and positron.

Footnotes

  1. Yes, I know SAS and R are not considered programming languages, but I honestly do not care. It is gate-keepery language designed to make people feel bad.↩︎

  2. In hindsight, this is probably my most important education. It taught me how to work with communities and to be willing to break norms when they are not working.↩︎

  3. On a side note: I learned that there may be licensing issues for Anaconda for Universities. So it is an out for me.↩︎

  4. I am not sure what packages are called in python.↩︎

  5. I will admit, I have not tried that hard to look for a python version of the course material↩︎

  6. Also, it is much faster than pandas and have and R version as well.↩︎

  7. I am pretty good at this.↩︎