Your address will show here +12 34 56 78
Welcome to the SCALE lab!
Scaling to infinity and beyond!

 

Research

 

The research of our lab revolves around three major directions:

High-Performance & Approximate Data Analytics

The amounts of data we can use to gain develop insight and knowledge are growing rapidly and are vast. Analysing these amounts of data requires substantial computational resources which today can only be found in high-performance computing infrastructure. Scaling out analyses on supercomputers therefore is key!In this area, we therefore develop new approaches to high-performance data analytics. We optimize existing analytics algorithms and develop new ones for high-performance computing infrastructure. To provide further efficiency and scalability, we develop novel approximate analytics approaches: by sacrificing little precision (with provable error bounds) we can accelerate analytics substantially, e.g., by sacrificing less than .01% precision, we accelerate analytics by more than one order of magnitude (see ADvANCe, MOVE and others).
In this area, we therefore develop new approaches to high-performance data analytics. We optimize existing analytics algorithms and develop new ones for high-performance computing infrastructure. To provide further efficiency and scalability, we develop novel approximate analytics approaches: by sacrificing little precision (with provable error bounds) we can accelerate analytics substantially, e.g., by sacrificing less than .01% precision, we accelerate analytics by more than one order of magnitude (see and others).

  • Text Hover
  • Text Hover
  • Text Hover
Spatial Indexing & Analytics

Spatial data is everywhere and is generated in vast amounts, be it from satellite surveys, smart sensors/IoT, GPS traces, semantically enriched, computational fluid dynamics (CFD), medical imaging and many more. At the same time applications as interactive maps, urban planning, medical diagnostics, simulation of CFD models depend on the efficient analysis and processing of vast amounts of spatial data.
In this line of work, we develop novel methods to efficiently analyze spatial data in the broadest sense. We work with road network and neuroscience data (modelling parts of the brain) to develop spatial analytics to efficiently extract subsets in large datasets, to find intersections between objects in vast amounts of spatial data and many others (see FLAT, OCTOPUS, THERMAL-JOIN, TRANSFORMERS, RUBIK and others). With our work, we aim at efficiently and effortlessly support the large-scale analysis of spatial data across applications.

Data Management on Novel Hardware

Hardware, i.e., CPU’s, storage and memory technology and others, evolves at a rapid pace. Understanding in detail the characteristics of new hardware, like the read/write performance characteristics of new storage technology, is key to adapting as well as optimizing data analysis algorithms.

In this line of work we consequently develop and optimize algorithms for enabling the efficient and scalable analysis of large amounts of data on novel hardware. We focus in particular on new ideas around storage, e.g., cold storage devices or shingles magnetic recording disks (SMR’s) for archiving data and its occasional analysis, and computing, e.g., neuromorphic hardware as a scalable & energy efficient analytics platform.

  • Text Hover
  • Text Hover

Applications

 

Our research is always motivated by real-world applications and use cases. It is currently driven by two major areas, scientific applications and spatial analytics.

Scientific Applications & Neuroscience

Scientists across different disciplines produce vast amounts of data through experimentation and simulation. While the amounts of data produced are already so big that they can barely be managed, the problem is certain to become worse as more and more data is generated and collected. A lot of our research is therefore driven by the needs of scientists in general and neuroscientists in particular.

We address the problems of neuroscientists on their quest to understand and simulate the rat brain. More specifically, we work with neuroscientists in the Human Brain Project (http://humanbrainproject.eu) to manage the vast amounts of data they use and produce. Their research, modeling and simulating a fraction of the rat brain, produces terabytes of data. Current solutions are inadequate to manage this data volume and we are thus investigating new methods to index and store it in order to provide efficient and scalable access. A particular problem we are currently addressing is the retrieval of objects in space, i.e., accessing neurons based on their position. While it is simple to index several thousand neurons, the neuroscientists have to do it for several millions or even billions of neurons. We are developing new spatial indexes to solve this problem.

  • Text Hover
  • Text Hover
  • Text Hover
  • Text Hover
  • Text Hover
Spatial Analytics & Road Space

Improving mobility and decreasing congestion are some of the biggest challenges facing cities today. Congestion impacts the daily lives of commuters, as well as businesses and visitors to any city. Sensors, the Internet of Things (IoT), GPS data and other sources of data provide city planners with a wealth of data. The data contains important hints to develop smart transport solutions that reduce congestion as well as to optimise the use of city public transport. Extracting the information and hints in this deluge of data, however, is a challenge due to the size as well as the number of heterogeneous sources.

We work with transport authorities to address these issues. More precisely, we develop the infrastructure to integrate and analyse heterogeneous data sources (data with a spatial aspect, e.g., sensors, GPS, maps, weather radar and others) to enable spatial analytics on it. Spatial analytics is used for applications like city planning and to optimise the use of limited road space (even in real time).

Demos & Visuals

 

We turn as much of our technology as possible – research projects and student projects – into cool demos! Check out our super whizz bang videos of some of our applications below.

Spatial Analytics on Novel Interfaces

The emergence of tablets has changed the way we interact with data substantially. No longer do we use slow and cumbersome scroll bars to look at query results but instead use touch interfaces which enable us to browse and analyse data extremely fast. Clearly, the underlying data infrastructure which was optimised for scrolling and thus sequential data access must be able to efficiently and scalable enable this fast and rather random access.
In this video, we show how our implementation of an index on the iPad (FLAT, NEURO) enables the efficient exploration of data via a touch interface. In a pich the user can choose a small subset (query region) of a small model showing a small model representing several thousand neurons. The app subsequently loads a more detailed representation of the query region so the user can inspect the neurons in more detai

Virtual Reality Scientific Data Exploration

This video shows our ground-breaking approach to visualise and analyse large scale scientific models. Using an HTC Vive headset as well as haptic gloves, users can immerse into a detailed model of the brain and analyse in virtual reality. Walking in the model to look at different parts of the brain, they can use gestures to pan, zoom and select further subsets to study the model in great detail. The video shows a first demo of the visualisation which can be extended to visualise other models and can also support sophisticated analyses.This particular video shows the message propagation mode: after selecting a subset of the model, messages will be injected into the branches crossing the subset of the model. The messages travel along the branches of the neutrons and leap over between them. The visualisation helps to understand the connectivity of the brain model. 

Energy-efficient Classification on Wearables

Sensors are becoming ever more pervasive and more powerful, meaning that we can perform increasingly complex tasks on them. This video shows our demo of a wearable device which collects data and classifies it using a neural network on the wearable device itself. The neural network is optimised for size (thus accuracy of classification) as well as energy efficiency. This particular application/demo shows the classification of physical exercises (e.g., push ups) on a mobile phone. The sensor on the body uses an inertial measurement unit to collect acceleration data. The data is classified on the wearable device using a neural network and is sent to a mobile phone using Bluetooth. The phone gives the user feedback on the quality and quantity of different exercises done.

Publication Highlights

 

Below is a selection of our publications. Click here for a complete list.

  • ICDE 2016

    TRANSFORMERS

    Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, no single method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8.

  • ADBIS 2016

    ADvaNCE

    Analyzing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, however, current approaches for data analysis struggle. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time.Crucial to the data analysis and clustering process, however, is that it is rarely straightforward. Instead, parameters need to be determined through several iterations. Entirely accurate results are thus rarely needed and instead we can sacrifice precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach to approximating DBSCAN. ADvaNCE uses two measures to reduce distance calculation overhead: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations. Our experiments show that our approach is in general one order of magnitude faster (at most 30x in our experiments) than the state of the art.

  • SIGMOD 2015

    THERMAL-JOIN

    Spatial joins are becoming increasingly ubiquitous in many applications, particularly in the scientific domain. While several approaches have been proposed for joining spatial datasets, no single method can efficiently join two spatial datasets in a robust manner with respect to their data distributions. Some approaches do well for datasets with contrasting densities while others do better with similar densities. None of them does well when the datasets have divergent data distributions. In this paper we develop TRANSFORMERS, an efficient and robust spatial join approach that is indifferent to such variations of distribution among the joined data. TRANSFORMERS achieves this feat by departing from the state-of-the-art through adapting the join strategy and data layout to local density variations among the joined data. We experimentally demonstrate that TRANSFORMERS outperforms state-of-the-art approaches by a factor of between 2 and 8.

Team & Jobs

 

Teamwork makes the dream work! We are always looking for talented and driven individuals who want to join our team. Get in touch if you are interested. If you are interested in joining as a PhD student, see here. For additional funding opportunities, see here. We also have regular openings for PostDocs (advertised on the usual job portals). See here for additional funding opportunities for postdoctoral research. 

Thomas Heinis
Principal Investigator
Dong-Wan Choi
PostDoc
Ehab Abdelhamid
PostDoc
Ahsan Javed Awan
PostDoc
Giannis Evagorou
PhD Student
Ali Hadian
PhD Student
Valentin Clement
PhD Student

Teaching

 

We deliver courses centering around data management at Imperial College. More precisely, we organise and teach Databases I (CO130) as well as Large Scale Data Management (CO412H). Have a look at the anonymous student feedback below to see how we are doing.

STUDENT TESTIMONIALS

  • I liked his jokes.
    CO130 Student
  • The lecturer did a great job on patiently explaining the concepts to us, thank you Dr Heinis.
    CO130 Student
  • Dr Heinis lectured this course very well.
    CO130 Student
  • ...I love your self deprecating humour and your down-to-earth attitude and I can wholeheartedly say that I really enjoyed the lectures.
    CO130 Student
  • Thank you very much for an excellent lecture. The work and topics covered has already given me an ’edge’ in my current work environment.
    Student

Contact Us

 

Get in touch with us! No matter if you have questions about our research, if you want to join our team or in case you want to explore opportunities for collaboration, use the form below to contact us or send a message to doc-scalelab@imperial.ac.uk!

Ph.D. Funding Opportunities

We are always looking for driven and talented Ph.D. applicants interested in developing novel data management techniques deployed and used across different disciplines. We are particularly looking for students with a strong background in data management and ideally also with a background in a different field (life sciences, natural sciences etc.)

Most funding opportunities are for European Union students, but there are several opportunities for overseas applicants as well:



You can find more information about the PhD program in the Department of Computing at Imperial College London, including funding opportunities here.

2017

Oehmichen, Axel; Guitton, Florian; Sun, Kai; Grizet, Jean; Heinis, Thomas; Guo, Yike

eTRIKS Analytical Environment: A Modular High Performance Framework for Medical Data Analysis Conference

Proceedings of the IEEE Big Data Conference, Boston, MA, USA, December 10-14, 2017, 2017.

BibTeX

Heinis, Thomas; Chapman, Adriane

Provenance Storage Book Chapter

Encyclopedia of Database Systems, Encyclopedia of Database Systems, Springer, 2017.

Links | BibTeX

Choi, Dongwan; Pei, Jiang; Heinis, Thomas

Efficient Mining of Regional Movement Patterns in Semantic Trajectories Journal Article

PVLDB, 10 (1), 2017.

Links | BibTeX

Heinis, Thomas

Neuromorphic Hardware As Database Co-Processors Conference

CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, 2017.

Links | BibTeX

Olma, Matthaios; Tauheed, Farhan ; Heinis, Thomas ; Ailamaki, Anastasia

BLOCK: Efficient Execution of Spatial Range Queries in Main-Memory Conference

Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017, 2017.

Links | BibTeX

Li, Tianrun ; Heinis, Thomas ; Luk, Wayne

ADvaNCE - Efficient and Scalable Approximate Density-Based Clustering Based on Hashing Journal Article

Informatica, 28 (1), pp. 105–130, 2017, ISBN: 1822-8844.

Links | BibTeX

Evagorou, Giannis; Heinis, Thomas

STATS - A Point Access Method for Multidimensional Clusters Inproceedings

Database and Expert Systems Applications - 28th International Conference, DEXA 2017, Lyon, France, August 28-31, 2017, Proceedings, Part I, pp. 352–361, 2017.

Links | BibTeX

Heinis, Thomas; Ailamaki, Anastasia

Data Infrastructure for Medical Research Book

2017, ISSN: 1931-7883.

Links | BibTeX

2016

Li, Tianrun; Heinis, Thomas ; Luk, Wayne

Hashing-Based Approximate DBSCAN Conference

Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Prague, Czech Republic, August 28-31, 2016, Proceedings, 2016.

Links | BibTeX

Pavlovic, Mirjana; Heinis, Thomas ; Tauheed, Farhan ; Karras, Panagiotis ; Ailamaki, Anastasia

TRANSFORMERS: Robust Spatial Joins on Non-uniform Data Distributions Conference

32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016, 2016.

Links | BibTeX

Pavlovic, Mirjana; Zacharatou, Eleni Tzirita ; Sidlauskas, Darius ; Heinis, Thomas ; Ailamaki, Anastasia

Space Odyssey: Efficient Exploration of Scientific Data Conference

Proceedings of the Third International Workshop on Exploratory Search in Databases and the Web, San Francisco, CA, USA, July 1, 2016, 2016.

Links | BibTeX

ã, Bruno Magalh R C; Tauheed, Farhan ; Heinis, Thomas ; Ailamaki, Anastasia ; ü, Felix Sch

An Efficient Parallel Load-Balancing Framework for Orthogonal Decomposition of Geometrical Data Conference

High Performance Computing - 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, 2016.

Links | BibTeX

Li, Tianrun; Heinis, Thomas ; Luk, Wayne

Hashing-Based Approximate DBSCAN Conference

Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Prague, Czech Republic, August 28-31, 2016, Proceedings, 2016.

Links | BibTeX

2015

Heinis, Thomas; Ham, David A

On-the-Fly Data Synopses: Efficient Data Exploration in the Simulation Sciences Journal Article

SIGMOD Record, 44 (2), pp. 23–28, 2015.

Links | BibTeX

Tauheed, Farhan; Heinis, Thomas ; Ailamaki, Anastasia

Configuring Spatial Grids for Efficient Main Memory Joins Conference

Data Science - 30th British International Conference on Databases, BICOD 2015, Edinburgh, UK, July 6-8, 2015, Proceedings, 2015.

Links | BibTeX

Venetis, Tassos; Ailamaki, Anastasia ; Heinis, Thomas ; Karpathiotakis, Manos ; Kherif, Ferath ; Mitelpunkt, Alexis ; Vassalos, Vasilis

Towards the Identification of Disease Signatures Conference

Brain Informatics and Health - 8th International Conference, BIH 2015, London, UK, August 30 - September 2, 2015. Proceedings, 2015.

Links | BibTeX

Karpathiotakis, Manos; Alagiannis, Ioannis ; Heinis, Thomas ; Branco, Miguel ; Ailamaki, Anastasia

Just-In-Time Data Virtualization: Lightweight Data Management with ViDa Conference

CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, 2015.

Links | BibTeX

Heinis, Thomas; Ailamaki, Anastasia

Reconsolidating Data Structures Conference

Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015., 2015.

Links | BibTeX

Tauheed, Farhan; Heinis, Thomas ; Ailamaki, Anastasia

THERMAL-JOIN: A Scalable Spatial Join for Dynamic Workloads Conference

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015.

Links | BibTeX

Zacharatou, Eleni Tzirita; Tauheed, Farhan ; Heinis, Thomas ; Ailamaki, Anastasia

RUBIK: Efficient Threshold Queries on Massive Time Series Conference

Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM '15, La Jolla, CA, USA, June 29 - July 1, 2015, 2015.

Links | BibTeX

2014

Heinis, Thomas

Data analysis: Approximation aids handling of big data Journal Article

Nature, 515 (7526), pp. 198, 2014.

Links | BibTeX

Heinis, Thomas; Tauheed, Farhan ; Ailamaki, Anastasia

Spatial Data Management Challenges in the Simulation Sciences Conference

Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24-28, 2014., 2014.

Links | BibTeX

Tauheed, Farhan; Heinis, Thomas ; ü, Felix Sch ; Markram, Henry ; Ailamaki, Anastasia

OCTOPUS: Efficient query execution on dynamic mesh datasets Conference

IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, 2014.

Links | BibTeX

2013

Heinis, Thomas; Tauheed, Farhan ; Pavlovic, Mirjana ; Ailamaki, Anastasia

Enabling Scientific Discovery Via Innovative Spatial Data Management Journal Article

IEEE Data Eng. Bull., 36 (4), pp. 3–10, 2013.

Links | BibTeX

Tauheed, Farhan; Nobari, Sadegh ; Biveinis, Laurynas ; Heinis, Thomas ; Ailamaki, Anastasia

Computational Neuroscience Breakthroughs through Innovative Data Management Conference

Advances in Databases and Information Systems - 17th East European Conference, ADBIS 2013, Genoa, Italy, September 1-4, 2013. Proceedings, 2013.

Links | BibTeX

Stougiannis, Alexandros; Tauheed, Farhan ; Heinis, Thomas ; Ailamaki, Anastasia

Accelerating Spatial Range Queries Conference

Joint 2013 EDBT/ICDT Conferences, EDBT '13 Proceedings, Genoa, Italy, March 18-22, 2013, 2013.

Links | BibTeX

Nobari, Sadegh; Tauheed, Farhan ; Heinis, Thomas ; Karras, Panagiotis ; é, St ; Ailamaki, Anastasia

TOUCH: In-memory Spatial Join by Hierarchical Data-oriented Partitioning Conference

Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013.

Links | BibTeX

Stougiannis, Alexandros; Pavlovic, Mirjana ; Tauheed, Farhan ; Heinis, Thomas ; Ailamaki, Anastasia

Data-driven Neuroscience: Enabling Breakthroughs via Innovative Data Management Conference

Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013.

Links | BibTeX

Pavlovic, Mirjana; Tauheed, Farhan ; Heinis, Thomas ; Ailamaki, Anastasia

GIPSY: Joining Spatial Datasets with Contrasting Density Conference

Conference on Scientific and Statistical Database Management, SSDBM '13, Baltimore, MD, USA, July 29 - 31, 2013, 2013.

Links | BibTeX

2012

Tauheed, Farhan; Heinis, Thomas ; ü, Felix Sch ; Markram, Henry ; Ailamaki, Anastasia

SCOUT: Prefetching for Latent Feature Following Queries Journal Article

PVLDB, 5 (11), pp. 1531–1542, 2012.

Links | BibTeX

Tauheed, Farhan; Biveinis, Laurynas ; Heinis, Thomas ; ü, Felix Sch ; Markram, Henry ; Ailamaki, Anastasia

Accelerating Range Queries for Brain Simulations Conference

IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, 2012.

Links | BibTeX

Tauheed, Farhan; Heinis, Thomas ; ü, Felix Sch ; Markram, Henry ; Ailamaki, Anastasia

SCOUT: Prefetching for Latent Feature Following Queries Journal Article

CoRR, abs/1208.0276 , 2012.

Links | BibTeX

2011

Heinis, Thomas; Branco, Miguel ; Alagiannis, Ioannis ; Borovica, Renata ; Tauheed, Farhan ; Ailamaki, Anastasia

Challenges and Opportunities in Self-Managing Scientific Databases Journal Article

IEEE Data Eng. Bull., 34 (4), pp. 44–52, 2011.

Links | BibTeX

2010

Maier, Cristina; Dash, Debabrata ; Alagiannis, Ioannis ; Ailamaki, Anastasia ; Heinis, Thomas

PARINDA: an Interactive Physical Designer for PostgreSQL Conference

EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings, 2010.

Links | BibTeX

2008

Heinis, Thomas; Pautasso, Cesare

Automatic Configuration of an Autonomic Controller: An Experimental Study with Zero-Configuration Policies Conference

2008 International Conference on Autonomic Computing, ICAC 2008, June 2-6, 2008, Chicago, Illinois, USA, 2008.

Links | BibTeX

Heinis, Thomas; Alonso, Gustavo

Efficient Lineage Tracking for Scientific Workflows Conference

Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, 2008.

Links | BibTeX

2007

Pautasso, Cesare; Heinis, Thomas ; Alonso, Gustavo

Autonomic resource provisioning for software business processes Journal Article

Information & Software Technology, 49 (1), pp. 65–80, 2007.

Links | BibTeX

2006

Pautasso, Cesare; Heinis, Thomas ; Alonso, Gustavo

JOpera: Autonomic Service Orchestration Journal Article

IEEE Data Eng. Bull., 29 (3), pp. 32–39, 2006.

Links | BibTeX

Tsalgatidou, Aphrodite; Athanasopoulos, George ; Pantazoglou, Michael ; Pautasso, Cesare ; Heinis, Thomas ; ø, Roy Gr ; ø, Hj ; ø, Arne{-}J ; Glittum, M; Topouzidou, Simela

Developing scientific workflows from heterogeneous services Journal Article

SIGMOD Record, 35 (2), pp. 22–28, 2006.

Links | BibTeX

Heinis, Thomas; Pautasso, Cesare ; Alonso, Gustavo

Mirroring Resources or Mapping Requests: Implementing WS-RF for Grid Workflows Conference

Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 16-19 May 2006, Singapore, 2006.

Links | BibTeX

2005

Heinis, Thomas; Pautasso, Cesare ; Deak, Oliver ; Alonso, Gustavo

Publishing Persistent Grid Computations as WS Resources Conference

First International Conference on e-Science and Grid Technologies (e-Science 2005), 5-8 December 2005, Melbourne, Australia, 2005.

Links | BibTeX

Heinis, Thomas; Pautasso, Cesare ; Alonso, Gustavo

Design and Evaluation of an Autonomic Workflow Engine Conference

Second International Conference on Autonomic Computing (ICAC 2005), 13-16 June 2005, Seattle, WA, USA, 2005.

Links | BibTeX

Pautasso, Cesare; Heinis, Thomas ; Alonso, Gustavo

Autonomic Execution of Web Service Compositions Conference

2005 IEEE International Conference on Web Services (ICWS 2005), 11-15 July 2005, Orlando, FL, USA, 2005.

Links | BibTeX

PostDoc Funding Opportunities

If you are interested in collaborating with our group based on an externally-funded scholarship, for example a Marie-Curie post-doctoral fellowship, as an experienced researcher, please get in contact with us. Several other opportunities for fellowships (including partial ones) are listed below. Please send a message to t.heinis@imperial.ac.uk if you are interested.


Applying for a Ph.D. position

We are looking for aspiring researchers that want to pursue a Ph.D. (in 3 to 3.5 years) in the broad area of scientific data management. The group focuses on scientific data management and high impact interdisciplinary research, i.e., developing ground breaking and novel data management techniques strongly motivated and used in other disciplines (see examples of past research here and demos here. The research interests of a successful applicant have to overlap considerably with the group's interests:


  • Big Data, Distributed Indexing & Processing
  • Scientific Data Management
  • Spatial Data, Spatial Indexing
  • Spatio-Temporal Indexing
  • High-dimensional Indexing/Clustering
  • In-Memory Indexing

To apply you will need to have a strong background in computer science (M.Sc. or B.Sc. in Computer Science or very closely related) and ideally solid experience with data management. Given the interdisciplinary nature of our group's research, the ideal candidate also has a background in a different discipline.

You must have excellent communication skills and prioritise work to meet deadlines. All applicants must be fluent in spoken and written English. Preference will be given to applicants with publications in the relevant areas.
How to apply: please send a message to t.heinis@imperial.ac.uk
Applications must include the following:

  1. A full CV
  2. Scan of your transcripts of your studies
  3. Contact information for 2 references who have agreed to speak about you, your work, and your potential
Starting date: as soon as possibleClosing date: open

About Imperial College and London

Imperial College is first class address to pursue excellent, high impact research. Imperial College consistently ranks among the top 5 schools in the world (Times Higher Education & QS rankings). The Department of Computing is also a leading department of Computer Science among UK Universities. It has consistently been awarded the highest research rating (5*) in Research Assessment Exercises (RAE), coming 2nd in the 2008 RAE, and was rated as "Excellent" in the previous national assessment of teaching quality.

Noisy, vibrant and truly multicultural, London is a megalopolis of people, ideas and frenetic energy. The capital and largest city of both the United Kingdom and of England, it is also the largest city in Western Europe and the European Union. Situated on the River Thames, London is an international capital of culture, music, education, fashion, politics, finance and trade which offers ample activities (besides research that is) for every interest, be it culture, sport events, shopping and clubbing.

Farhan Tauheed, Laurynas Biveinis, Thomas Heinis, Felix Schürmann, Henry Markram, Anastasia Ailamaki: Accelerating Range Queries for Brain Simulations. IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, 2012.
Tianrun Li, Thomas Heinis, Wayne Luk: Hashing-Based Approximate DBSCAN. Advances in Databases and Information Systems - 20th East European Conference, ADBIS 2016, Prague, Czech Republic, August 28-31, 2016, Proceedings, 2016.
Li, Tianrun, Heinis, Thomas, Luk, Wayne: ADvaNCE - Efficient and Scalable Approximate Density-Based Clustering Based on Hashing. In: Informatica, 28 (1), pp. 105–130, 2017, ISBN: 1822-8844.
Alexandros Stougiannis, Farhan Tauheed, Thomas Heinis, Anastasia Ailamaki: Accelerating Spatial Range Queries. Joint 2013 EDBT/ICDT Conferences, EDBT '13 Proceedings, Genoa, Italy, March 18-22, 2013, 2013.
Farhan Tauheed, Thomas Heinis, Felix Schürmann, Henry Markram, Anastasia Ailamaki: OCTOPUS: Efficient query execution on dynamic mesh datasets. IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, 2014.
Farhan Tauheed, Thomas Heinis, Anastasia Ailamaki: THERMAL-JOIN: A Scalable Spatial Join for Dynamic Workloads. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015.
Mirjana Pavlovic, Thomas Heinis, Farhan Tauheed, Panagiotis Karras, Anastasia Ailamaki: TRANSFORMERS: Robust Spatial Joins on Non-uniform Data Distributions. 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016, 2016.
Eleni Tzirita Zacharatou, Farhan Tauheed, Thomas Heinis, Anastasia Ailamaki: RUBIK: Efficient Threshold Queries on Massive Time Series. Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM '15, La Jolla, CA, USA, June 29 - July 1, 2015, 2015.
Alexandros Stougiannis, Mirjana Pavlovic, Farhan Tauheed, Thomas Heinis, Anastasia Ailamaki: Data-driven Neuroscience: Enabling Breakthroughs via Innovative Data Management. Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013.
We gratefully acknowledge the sponsors of our research