Optimising R Workflows: Welcome!


Introduction

Dr Anna Krystallli

R-RSE


https://optimising-r.netlify.app

đź‘‹ Hello

me: Dr Anna Krystalli

  • Research Software Engineering Consultant, r-rse

    • mastodon annakrystalli@fosstodon.org
    • twitter @annakrystalli
    • github @annakrystalli
    • email r.rse.eu[at]gmail.com
  • Editor rOpenSci

  • Founder & Core Team member ReproHack

Objectives

In this course we’ll explore:

  • Benchmarking and profiling code

  • Best practice for writing performant code in R

  • Best practice in working efficiently with data

  • Parallelising workflows

Background

Computation

transistor icon

Transistor icons created by surang - Flaticon

Computers represent info using binary code in the form of digital 1s and 0s inside the central processing unit (CPU) and RAM. These digital numbers are electrical signals that are either on or off inside the CPU or RAM.

Each transistor is a switch, that is, 0 when turned off and 1 when turned on. The more transistors, the more switches.

Transistors are the basic building blocks that regulate the operation of computers, mobile phones, and all other modern electronic circuits and is the basic unit of the CPU

Moore’s law

When the price is unchanged, the number of components that can be accommodated on the integrated circuit will double every 18-24 months, and the performance will double. In other words, the performance of a computer that can be bought for every dollar will more than double every 18-24 months

Yet…

we’ve hit clock speed stagnation

50 Years of Processor Trends. Distributed by Karl Rupp under a CC-BY 4.0 License

About computer hardware

CPU (Processing)

RAM (memory)

Hard Disks, Networks (I/O)

CPU

  • The central processing unit (CPU), or the processor, is the brains of a computer. The CPU is responsible for performing numerical calculations.

  • The faster the processor, the faster R will run.

  • The clock speed (or clock rate, measured in hertz) is the frequency with which the CPU executes instructions. The faster the clock speed, the more instructions a CPU can execute in a section.

RAM

  • Random access memory (RAM) is a type of computer memory that can be accessed randomly: any byte of memory can be accessed without touching the preceding bytes.

  • The amount of RAM R has access to is incredibly important. Since R loads objects into RAM, the amount of RAM you have available can limit the size of data set you can analyse. MEMORY BOUND

Even if the original data set is relatively small, your analysis can generate large objects

About R

R is an interpreted language

Compiled Languages

Converted directly into machine code that the processor can execute.

  • Tend to be faster and more efficient to execute.

  • Need a “build” step which builds for system they are run on

  • Examples: C, C++, Erlang, Haskell, Rust, and Go

Interpreted Languages

Code interpreted line by line during run time.

  • significantly slower although just-in-time compilation is closing that gap.

  • much more compact and easy to write

  • Examples: R, Ruby, Python, and JavaScript.

R performance

  • R offers some excellent features: dynamic typing, lazy functional evaluation and object-orientation

  • Side effect: operations are undertaken in single-threaded mode, i.e. sequentially

  • Many routines in R are written in compiled languages like C & Fortran.

  • R performance can be enhanced by linking to optimised Linear Algebra Libraries.

  • R offers many ways to parallelise computations.

  • Many packages wrap more performant C, Fortran, C++ code.

R as a language and environment is reasonably well established and understood. A combination of dynamic typing, lazy functional evaluation and object-orientation (in several flavors) makes for a somewhat unique combination.

One side-effect of this design is that core operations are undertaken in single-threaded mode, or, in other words, sequentially.

About this course

  • I normally like to live code…BUT!

  • There’s a lot of materials to get through so I will be copying & pasting from the materials alot

  • Have the materials handy to follow along

  • Please stop me for questions or to share your own experiences

  • Lunch around 1pm

Let’s go!

Optimising R Workflows: Welcome! Introduction Dr Anna Krystallli R-RSE https://optimising-r.netlify.app

  1. Slides

  2. Tools

  3. Close
  • Optimising R Workflows: Welcome!
  • đź‘‹ Hello
  • Objectives
  • Background
  • Computation ...
  • Moore’s law
  • Yet…
  • About computer hardware
  • CPU (Processing)...
  • About R
  • R is an interpreted language
  • R performance
  • About this course
  • Let’s go!
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help