Version Control and Documentation with GitHub

Dr. Maximilian Hindermann

April 19, 2024

Program

About us

Today’s goal

Course materials

Available right now at https://github.com/RISE-UNIBAS/clean-code

Version control with Git

What is Git?

Local version control

Centralized version control

Distributed version control

Using Git

After installation on your computer, you can use Git:

Git repository hosting services

But where do I host my Git repository? Do I have to configure a server myself?

GitLab at Unibas

In contrast to GitHub, GitLab can be installed on your own infrastructure (e.g., required for sensitive data).

There are several GitLab instances at Unibas but none run as official university-wide service:

National infrastructure

Switch offers a GitLab instance, c4science supports Git:

Looking at a sample GitHub repository

GitHub repo for these slides: https://github.com/RISE-UNIBAS/clean-code/blob/main/crash-course-github/slides.md

Image credit: Alex Eylar, “Inception”, CC BY-NA-SA 2.0.

Git(Hub) core concepts

Configure your GitHub Pro account

There is a difference in functionality between GitHub Free and GitHub Pro accounts. For example, GitHub Free accounts cannot use GitHub pages.

Task 1

As a student, you can get a free GitHub Pro account:

  1. Sign up at https://github.com/ with your university email and choose GitHub Free.
  2. Verify your email.
  3. Fill in your billing information with your full legal name as it appears on your academic affiliation documentation. (You do not have to add a payment method.)
  4. Go to https://education.github.com/benefits and get a free upgrade to GitHub Pro by following the prompts.

Connect to GitHub

There are various ways in which you can interact with GitHub. In this course, we limit ourselves to the following:

  1. The web-interface on https://github.com/ in your browser.
  2. GitHub Desktop.

Task 2

Please install GitHub Desktop on your machine.

If you want to use your IDE instead of GitHub Desktop, you are free to do so, but we can only offer limited support, namely for PyCharm or RStudio.

Repository

A repository is the most basic element of GitHub. They’re easiest to imagine as a project’s folder. A repository contains all of the project files (including documentation), and stores each file’s revision history. Repositories can have multiple collaborators and can be either public or private.

From GitHub glossary/repository

Task 3: Create a repository on with the GitHub web-interface

Commit

A commit, or “revision”, is an individual change to a file (or set of files). When you make a commit to save your work, Git creates a unique ID (a.k.a. the “SHA” or “hash”) that allows you to keep record of the specific changes committed along with who made them and when. Commits usually contain a commit message which is a brief description of what changes were made.

From GitHub glossary/commit

Task 4: Commit to a repository with the GitHub web-interface

Clone

A clone is a copy of a repository that lives on your computer instead of on a website’s server somewhere, or the act of making that copy. When you make a clone, you can edit the files in your preferred editor and use Git to keep track of your changes without having to be online. The repository you cloned is still connected to the remote version so that you can push your local changes to the remote to keep them synced when you’re online.

From GitHub glossary/clone

Task 5: Sign in to GitHub Desktop

Task 6: Clone a repository with GitHub Desktop

Push

To push means to send your committed changes to a remote repository on GitHub.com. For instance, if you change something locally, you can push those changes so that others may access them.

From GitHub glossary/push

Task 7: Push a commit to remote with GitHub Desktop

Branches

A branch is a parallel version of a repository. It is contained within the repository, but does not affect the primary or main branch allowing you to work freely without disrupting the “live” version. When you’ve made the changes you want to make, you can merge your branch back into the main branch to publish your changes.

From GitHub glossary/branch

Task 8: Create a “new”-branch with the GitHub web-interface

Task 9: Commit to “new”-branch with the GitHub web-interface

Pull

Pull refers to when you are fetching in changes and merging them. For instance, if someone has edited the remote file you’re both working on, you’ll want to pull in those changes to your local copy so that it’s up to date.

From GitHub glossary/pull

Task 10: Switch to “new”-branch with GitHub Desktop

Merge and pull requests

Merging takes the changes from one branch (in the same repository or from a fork), and applies them into another. This often happens as a “pull request” (which can be thought of as a request to merge), or via the command line. A merge can be done through a pull request via the GitHub.com web interface if there are no conflicting changes, or can always be done via the command line.

From GitHub glossary/merge

Task 11: Merge “new”-branch into main via a pull request

Task 12: Delete the “new”-branch with the GitHub web-interface

Optional tasks

Task 13: review a pull request

Task 14: create and resolve a merge conflict

Primer on documentation

Why is documentation important?

Without documentation your future self (let alone other people) won’t be able to easily read your code and your code won’t be FAIR :

“Software, including its documentation and license, should meet domain-relevant community standards and coding practices (e.g., choice of programming language, standards for testing, usage of file formats, accessibility […]) that enable reuse” (Chue Hong et al. 2022: 13).

In addition, apart from the time and money spent (re)understanding your undocumented code, this potentially means many missed out opportunities, including:

Different levels of documentation

Documentation is required at different levels of your research project:

  1. Project level
  2. User level
  3. Systems level

Image credit: xkcd, “Documents”, CC BY-NA 2.5.

Project level documentation

A README file provides information about your files (code, data, and others) and how they are interrelated. The University of Basel’s RDMN provides more resources on data and file organization. The structure of a README file should include:

User level documentation

“IEEE Standard for Software User Documentation” in IEEE Std 1063-1987: 1-20, 10.1109/IEEESTD.1988.121943.

In order to create good user software documentation, answer the following questions:

  1. What part(s) of the software need to be documented?
  2. Who is the audience of the documentation?
  3. What is the information required by the target audience?
  4. What is the usage mode of the documentation?

GitHub best practices

README.md

You can add a README file to your repository to tell other people why your project is useful, what they can do with your project, and how they can use it.

From the GitHub documentation on README files

Materials

Questions

LICENSE.md

Public repositories on GitHub are often used to share open source software. For your repository to truly be open source, you’ll need to license it so that others are free to use, change, and distribute the software.

From the GitHub documentation on licensing a repository

Materials

Questions

CITATION.cff

You can add a CITATION.cff file to the root of a repository to let others know how you would like them to cite your work. The citation file format is plain text with human- and machine-readable citation information.

From the GitHub documentation on CITATION files

Materials

Questions

Releases

GitHub’s way of packaging and providing software to your users.

From GitHub glossary/release

Materials

Questions

Zenodo pipeline

GitHub repositories can be published to Zenodo as releases.

Materials

CHANGELOG.md

A changelog is a file which contains a curated, chronologically ordered list of notable changes for each version of a project.

From https://keepachangelog.com/en/1.1.0/

Materials

Questions

GitHub pages

A static site hosting service designed to host your personal, organization, or project pages directly from a GitHub repository.

From GitHub glossary/GitHub pages

Materials

Questions

Advanced topics: secrets, collaboration, automation

Getting help

References and further reading