Data Management Data Structures

Data structures are often studied along algorithms but they are also essential for organizing data in database management and analytic systems.
In this class we look at fundamental families of data structures that aim at supporting efficient storage and retrieval tasks typical in the above systems.

The families of data structures in which we are interested, and the respective representative instances, are the following:

  • Modern Hash-base Structure – This is an important class of structure for point access. We will look into open addressing (cuckoo, swiss table) and close addressing (TBD) options.
  • Scalable Trees – These structures cover in addition range access and we study examples of compact/efficient trees such as radix tree (ART) and concurrent access structures such as range index (Masstree)
  • Persistent Structures – There are structures that lend themselves well to be operated in-memory but that present a persistent component. We will study structures with In-place updates (B-link* tree) and append-only (Log-Structured Merge Trees)
  • Probabilistic Structures – We look into the use of approximation techniques into data structures. In particular, we look into compact set membership (Bloom Filter) and Approximate Balancing techniques (Skiplist).
  • Compressed/Learned Structures – Similarly to approximation, compression also creates interesting possibilities. We look into lossless approaches in dictionary compression and lossy ones in learned Index (TBD).
  • Each of the five modules is comprised of two classes. There are two classes reserved for student projects presentations.

Details

Code 63112
Type Course
ECTS 5
Site Fribourg
Track(s) T6 – Data Science
Semester S2024

Teaching

Learning Outcomes

Upon successful completion of this class, a student will be able to:

  • Identify and understand the criteria necessary to select the best data structure to organize a given data set
  • Describe a wide array of data organization structures and enumerate their advantages and disadvantages
  • Explain the different access methods that each data structure facilitates, along with their performance implications
  • Write and analyze source code that implements the data structures
Lecturer(s) Alberto Lerner
Language english
Course Page

The course page in ILIAS can be found at https://ilias.unibe.ch/goto_ilias3_unibe_crs_2793415.html.

Schedules and Rooms

Period Weekly
Schedule Thursday, 14:15 - 17:00
Location UniFR, PER21
Room E230

Additional information

Comment

First Lecture
The first lecture will take place on Thursday, 22.02.2024 at 14:15 in UniFR, PER21, room E230.

Pre-requisites
Students must have gone through an undergraduate data structures & algorithms class and be extremely comfortable with a statically typed programming languages (e.g., Go, Rust, C++, C, or Java). The class is heavy on exercises and will use the Go language. The latter is easy to pick up for students comfortable with programming. Therefore, previous hands-on experience in programming is also a strict prerequisite.

Grading
The grading has four elements:

  • Each module above has one programming assignment.
  • The students are also expected to research and present an advanced data structure that’s purposefully built for a specific type of data (e.g., string/text, graph, trajectory, time series, geo, arrays, etc). They are expected to write a small experimental report about the structure and present them in class.
  • Class participation
  • Final Exam