CS-E4000_aalto-CUR-177401-3095875: Available topics

SS1 : Privacy and Dark Patterns

Tutor: Sanna Suoranta (sanna.suoranta@aalto.fi)

Users privacy should have been getting better after EU's GDPR regulation. However, users can be lured to give more rights to their data than what they would initially wanted. How this happens and how could we prevent it?

References:

Christoph Bösc, Benjamin Erb, Frank Kargl, Henning Kopp, and Stefan Pfattheicher (2016) Tales from the Dark Side: Privacy Dark Strategies and Privacy Dark Patterns. Proceedings on Privacy Enhancing Technologies. pp 237-254. DOI 10.1515/popets-2016-003
Waldman, A. E. (2020) Cognitive biases, dark patterns, and the 'privacy paradox'. Current opinion in psychology. vol 31, feb 2020, pp. 105-109, https://doi.org/10.1016/j.copsyc.2019.08.025

SS2: What users really think while trying to use something that requires security?

Tutor: Sanna Suoranta (sanna.suoranta@aalto.fi)

Nowadays, brain imaging technology can provide us information what users are really seeing, thinking and feeling while they use a user interface based on brain areas that are activating. What happens when a user see a phishing message? How could we help users to act secure ways by providing better user interfaces?

References:

Zhepeng Rui and Zhenyu Gu (2021) A Review of EEG and fMRI Measuring Aesthetic Processing in Visual User Experience Research. Computational Intelligence and Neuroscience. Vol 2021. https://doi.org/10.1155/2021/2070209
Ajaya Neupane, Nitesh Saxena, Jose Omar Maximo, and Rajesh Kana (2016) Neural Markers of Cybersecurity: An fMRI Study of Phishing and Malware Warnings.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 11, NO. 9, SEPTEMBER 2016, DOI 10.1109/TIFS.2016.2566265
Ajaya Neupane, Nitesh Saxena, Keya Kuruvilla, Michael Georgescu, and Rajesh Kana (2014), NDSS'14, https://nsaxena.engr.tamu.edu/wp-content/uploads/sites/238/2019/12/nskgk-ndss14.pdf
Vance. A., Jenkins, J. L., Anderson, B. B., Bjornn, D. K.*, & Kirwan, C. B. (2018). Tuning out
security warnings: A longitudinal examination of habituation through fMRI, eye tracking, and field experiments. Management Information Systems Quarterly, 43(2), 1-26. https://doi.org/10.25300/MISQ/2018/14124

SS3: Cookie consent forms

Tutor: Sanna Suoranta (sanna.suoranta@aalto.fi)

All webpages ask their users to accept cookies. Does this really protect users who do not want to share their actions with web services?

References:

Dino Bollinger, Karel Kubicek, Carlos Cotrini, and David Basin (2022) Automating Cookie Consent and GDPR Violation Detection Proceedings of the 31st USENIX Security Symposium. August 10–12, 2022, Boston, MA, USA https://www.usenix.org/system/files/sec22-bollinger.pdf
Cristiana Santos, Nataliia Bielova, Célestin Matte. Are cookie banners indeed compliant with the law? cs.CR arXiv:1912.07144 https://doi.org/10.48550/arXiv.1912.07144
Chiara Krisam, Heike Dietmann, Melanie Volkamer, Oksana Kulyk. Dark Patterns in the Wild: Review of Cookie Disclaimer Designs on Top 500 German Websites EuroUSEC '21: Proceedings of the 2021 European Symposium on Usable SecurityOctober 2021Pages 1–8https://doi.org/10.1145/3481357.3481516

XW:Model based Learning to Optimize (L2O): Applications Survey

Tutor: Xinjue Wang (xinjue.wang@aalto.fi)

Learning to Optimize (L2O) stands at the intersection of machine learning and optimization. While traditional optimization techniques are rooted in theory, L2O utilizes machine learning to craft optimization methods, focusing on mitigating the long-winded task of manual iterations in engineering. By training on specific problem sets, L2O creates optimization methods tailored to solve similar problems effectively. Its success hinges on factors like the type of optimization, the learning method architecture, and the training process. A variety of modern challenges have been tackled using L2O, such as: - Image Restoration & Reconstruction: L2O techniques have transformed tasks like image denoising, deblurring, super-resolution, inpainting, and more. - Medical Imaging: This field demands precision, especially in methods like MRI and CT imaging. L2O presents an innovative solution, dealing even with complex-valued inputs, to offer accurate image reconstructions. - Wireless Communication: Here, L2O helps in areas such as resource management, signal detection, and LDPC coding. Notably, in MIMO detection, L2O methods have shown resilience to low signal-to-noise ratios and other challenges. This student project aims to conduct a comprehensive survey of the applications of model-based L2O techniques. Beyond the survey, there's an opportunity for further in-depth research by introducing an innovative L2O framework tailored for image datasets with distinct patterns. Prerequisite: Basic understanding of classical optimization methods, deep learning and PyTorch programming.

References:

- Chen, T., Chen, X., Chen, W., Wang, Z., Heaton, H., Liu, J., & Yin, W. (2022). Learning to optimize: A primer and a benchmark. _The Journal of Machine Learning Research_, _23_(1), 8562-8620. - Monga, V., Li, Y., & Eldar, Y. C. (2021). Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. _IEEE Signal Processing Magazine_, _38_(2), 18-44. -
Zhang, J., & Ghanem, B. (2018). ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In _Proceedings of the IEEE conference on computer vision and pattern recognition_ (pp. 1828-1837). -
Xiang, J., Dong, Y., & Yang, Y. (2021). FISTA-Net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. _IEEE Transactions on Medical Imaging_, _40_(5), 1329-1339. -
Ma, J., Liu, X. Y., Shou, Z., & Yuan, X. (2019). Deep tensor admm-net for snapshot compressive imaging. In _Proceedings of the IEEE/CVF International Conference on Computer Vision_ (pp. 10223-10232).

BL1: Adversarial perturbation attacks

Tutor: Blerta Lindqvist (blerta.lindqvist@aalto.fi)

The topic is about evasion attacks and it's flexible depending on student interest. It can be theoretical or with code experiments, for example, to reproduce a paper's results. Or perhaps to try a new attack. Pairs of students can also work together, provided they show that it is a team effort with collaboration.

Reference:

A list with a focus on evasion attacks and defenses https://nicholas.carlini.com/writing/2018/adversarial-machine-learning-reading-list.html

BL2: Adversarial defenses against adversarial perturbation attacks

Tutor: Blerta Lindqvist (blerta.lindqvist@aalto.fi)

The topic is about adversarial defenses against evasion attacks and it's flexible depending on student interest. It can be theoretical or with code experiments, for example, to reproduce a paper's results. Or perhaps to try a new defense. Pairs of students can also work together, provided they show that it is a team effort with collaboration.

References:

A list with a focus on evasion attacks and defenses https://nicholas.carlini.com/writing/2018/adversarial-machine-learning-reading-list.html

SR: Microservices - when and how to use them

Tutor: Sara Ranjbaran (sara.ranjbaran@aalto.fi)

Microservice architecture is a modern approach to software system design in which the functionality of the system is divided into small independent units. Microservice systems differ from a more traditional monolithic system in many ways some of which are unexpected. A cost of a migration from a monolithic system to a system based on microservices is often substantial so this decision needs to be carefully evaluated. Microservices have become very popular in recent years. An increasing number of companies (e.g., Amazon, Netflix, LinkedIn) are moving towards dismantling their existing monolithic applications in favor of distributed microservice systems. As with any big software project, migrating to a microservice architecture often requires considerable investment. In this project work, you will discuss the benefits and drawbacks of adopting microservice architecture in comparison to monolithic architecture. Depending on your interests, possible viewpoints to this generic topic are also for example security challenges, scalability, serverless computing, service mesh, big data platforms, etc.

References:

Pooyan Jamshidi, Claus Pahl, Nabor C. Mendon.a, James Lewis, and KStefan Tilkov. Microservices: The journey so far and challenges ahead. IEEE SOFTWARE, 35(3):24 – 35, 2018.
Nicola Dragoni, Saverio Giallorenzo, Alberto Lluch Lafuente, Manuel Mazzara, Fabrizio Montesi, Ruslan Mustafin, and Larisa Safina. Microservices: Yesterday, Today, and Tomorrow, pages 195–216. Springer International Publishing, Cham, 2017.

RM: Understanding prompt generator for AI

Tutor: Rongjun Ma (rongjun.ma@aalto.fi)

A prompt generator is a tool that generates prompts for interacting with AI-powered natural language processing models. These prompts are essential for instructing the AI model on what task or response you want it to generate. Nowadays, prompt generators can be used in many cases such as draft instructions for Midjourney. This project aims to explore how prompt generator works and how it helps human-machine communication. Students can choose one use case of the prompt generator such as Midjourney text generator or Information Retrieval Prompts. During the course, students will conduct a literature review on one use case and understand how different types of prompts work, analyze the existing frameworks, and propose design implications.

Reference:

Ruskov, M. (2023). Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales. arXiv preprint arXiv:2302.08961.
Liu, V., & Chilton, L. B. (2022, April). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1-23)

AZ1: Generating a realistic visual crowdsourced dataset using Carla simulator for autonomous driving

Tutor: Aziza Zhanabatyrova (zhanabatyrova.aziza@aalto.fi)

The task is to write a script to be able to generate a dataset with some of the crowdsourced data characteristics. For example multiple vehicles with different/ random camera placements, camera parameters. Different scenarios and scenes. Provide a document explaining your findings.

Reference:

https://carla.org/
https://github.com/carla-simulator/carla

AZ2: Survey on robust camera pose estimation methods

Tutor: Aziza Zhanabatyrova (zhanabatyrova.aziza@aalto.fi)

Accurate camera relocalization plays an important role in autonomous driving. Given query images, the camera relocalization task aims at estimating the camera poses for which the images are taken. The task is to write a survey on recent methods robust to noisy data/ incomplete data, or other challenges.

Reference:

https://ieeexplore.ieee.org/document/4676964
https://arxiv.org/abs/2211.11238

MG: Enhancing the usability of the mobile health apps (or Websites) designed for older adults

Tutor: Maedeh Ghorbanian Zolbin (maedeh.ghorbanianzolbin@aalto.fi)

this topic focuses on designing mobile health apps (or Websites) according to specific features of older adults to meet their unique needs. This topic will answer to the following questions: (1). What challenges do older adults face when using current mobile health apps ( or websites)? (2) How can the user interface and overall design of mobile health apps (or websites) be improved to better meet the needs of older adults and accommodate their cognitive and physical abilities?

Reference:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8837196/
https://journals.sagepub.com/doi/full/10.1177/1064804619840731

ES1: Calibration in quantized LLMs

Tutor: Erik Schultheis (erik.schultheis@aalto.fi)

Recently, heavily quantized large language models (LLMs) have become really popular, as they enable high-quality inference on low-cost hardware. Quantization comes with a noticable drop in model quality. The idea of the project is to look at this problem from the point-of-view of model calibration, that is, for each context X with predicted token T in V, for each v in V: P[T=v | phi(X)_v=p] = p. The survey part would look at LLMs in general (a very deep understanding is not necessary, though), and at model calibration (measures of calibration and algorithms for calibrating). For the hands-on part, the student would run several quantized versions with different weight resolutions and report calibration measures, and then run some off-the-shelf calibration algorithms and see if (part of) the performance drop can be restored. If the results are positive, this might also be an interesting topic for a subsequent master's thesis.

Reference:

LLama 2 paper: https://arxiv.org/abs/2307.09288 calibration in scikit learn: https://scikit-learn.org/stable/modules/calibration.html multiclass calibration
https://arxiv.org/pdf/1902.06977.pdf (might not be necessary for the project, but could be looked at if the student is interested)

ES2: Adaptive repeat penalties for LLMs

Tutor: Erik Schultheis (erik.schultheis@aalto.fi)

Text generation with large language models is based on a large neural network that predicts scores for each possible next token, but in practice, the choice of selecting the next token is more complicated than just sampling according to the induces probability distribution. The main part of the project is to survey several existing heuristics for next-token selection. For the hands-on part, some experimentation with adaptive repetition penalty could be an interesting topic: Current heuristics penalize repeating the same token within a given time frame, which improves generate text, but also leads to often quite long sentences (as the "." token is penalized). An extension of this heuristic that takes into account the repetition frequency might be able to improve this.

Reference:

LLama2: https://arxiv.org/abs/2307.09288 locally typical
sampling: https://arxiv.org/abs/2202.00666 Llama.cpp
implementation: https://github.com/ggerganov/llama.cpp (the command line options reveal several tunable heuristics for next token selection)

VH: Distributed Artificial Intelligence for IIoT

Tutor: Vesa Hirvisalo (vesa.hirvisalo@aalto.fi)

Distributed Artificial Intelligence (DAI) is needed in large systems, such as in future IIoT systems. In addition to distributed inference, there might be need for distributed decision making. For example, a fleet of autonomous vehicles often must make decisions that affect each other. This kind of a multi-agent system needs to share information between nodes, causing further load on the communication infrastructure. Also, training is often very time demanding and computationally heavy process, that can benefit significantly from distributed training methods. The task is to make a review on DAI, especially the branches that include the services structures within their research (DAI as a Service, DAIaaS).

Reference:

N. Janbi, I. Katib, and R. Mehmood. Distributed artificial intelligence: Taxonomy, review, framework, and reference architecture, in Intelligent Systems with Applications, 18, 2023. DOI: 10.1016/j.iswa.2023.200231

DH: Amortized Meta Bayesian Optimization

Tutor: Daolang Huang (daolang.huang@aalto.fi)

Bayesian optimization (BO) is a well-established method to optimize black-box functions whose direct evaluations are costly. In recent years, several meta-BO architectures have been proposed, which can learn the surrogate model and the acquisition function end-to-end on a set of BO tasks by neural networks. In this project, our target is to read through several recent works on this topic and build our own pipeline for meta-BO. Prerequisites: Certain knowledge about deep learning and probability, basic idea about Bayesian optimization. PyTorch programming experience is expected.

Reference:

MONGOOSE: Path-wise Smooth Bayesian Optimisation via Meta-learning https://arxiv.org/abs/2302.11533
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes https://arxiv.org/abs/2305.15930

KH: Large language models for modeling of physical systems

Tutor: Katsiaryna Haitsiukevich (katsiaryna.haitsiukevich@aalto.fi)

Large language models (LLMs) changed the landscape of natural language processing (NLP) over the last few years. However, the capabilities of LLMs can be utilized in a number of applications beyond NLP tasks [1]. One can see LLMs as a general pattern machines that can be applied to extraction of more abstract non-linguistic patterns from data (see, e.g., [2]). In this project, we will study different approaches on how LLMs can be utilized for modeling of physical systems. For example, potentially large language models can accelerate the extraction of symbolic equations from the system description similar to [3] or LLMs can by used augment the training dataset to boost the classifier/regressor performance similar to [4].

Reference:

Jablonka, K.M., Ai, Q., Al-Feghali, A., Badhwar, S., Bocarsly, J.D., Bran, A.M., Bringuier, S., Brinson, L.C., Choudhary, K., Circi, D. and Cox, S., 2023. 14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon. Digital Discovery.
Mirchandani, S., Xia, F., Florence, P., Ichter, B., Driess, D., Arenas, M.G., Rao, K., Sadigh, D. and Zeng, A., 2023. Large language models as general pattern machines. arXiv preprint arXiv:2307.04721.
Landajuela, M., Lee, C.S., Yang, J., Glatt, R., Santiago, C.P., Aravena, I., Mundhenk, T., Mulcahy, G. and Petersen, B.K., 2022. A unified framework for deep symbolic regression. Advances in Neural Information Processing Systems, 35, pp.33985-33998.
Hegselmann, S., Buendia, A., Lang, H., Agrawal, M., Jiang, X. and Sontag, D., 2023, April. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics (pp. 5549-5581). PMLR.

AJ1: Where is the Beef ? The Optimal Selection of Training Data.

Tutor: Alexander Jung (alex.jung@aalto.fi)

Assume you have a powerful model, such as a deep neural network with billions of tunable parameters, but way to little training data. Which other datasets, available freely via the internet such as wikipedia articles, are most useful additions to your local training set? This project studies principled methods and fundamental limits for this training data selection problem.

Reference:

Werner, M., He, L., Praneeth Karimireddy, S., Jordan, M., and Jaggi, M., “Provably Personalized and Robust Federated Learning”, arXiv e-prints, 2023. doi:10.48550/arXiv.2306.08393.
Laurent Jacob, Jean-philippe Vert, Francis Bach, Clustered Multi-Task Learning: A Convex Formulation Part of Advances in Neural Information Processing Systems 21 (NIPS) 2008 SarcheshmehPour,
Y., Tian, Y., Zhang, L., and Jung, A., “Clustered Federated Learning via Generalized Total Variation Minimization”, arXiv e-prints, 2021. doi:10.48550/arXiv.2105.12769.

AJ2: Trade offs between Explainability and Accuracy of Machine Learning.

Tutor: Alexander Jung (alex.jung@aalto.fi)

Many ML application domains require more than just a high accuracy (or statistical power) of trained models. The trained ML models should also be interpretable or explainable. This project studies the intrinsic trade off between a precise measure of subjective explainability and the achievable accuracy in important ML settings.

Reference:

A. Jung and P. H. J. Nardelli, "An Information-Theoretic Approach to Personalized Explainable Machine Learning," in IEEE Signal Processing Letters, vol. 27, pp. 825-829, 2020, doi: 10.1109/LSP.2020.2993176
Zhang, L., Karakasidis, G., Odnoblyudova, A., Dogruel, L., and Jung, A., “Explainable Empirical Risk Minimization”, arXiv e-prints, 2020. doi:10.48550/arXiv.2009.01492.

MK1: Certification of machine learning robustness

Tutor: Mikko Kiviharju (mikko.kiviharju@aalto.fi)

Some machine learning types are vulnerable to different data-level attacks, such as data poisoning, backdooring or evasion. There are, however, methods to make implementations more robust, and authorities are becoming more aware of the need to check ML implementations specifically against data level attacks. This seminar topic can be approached either comparatively, where existing national schemes (e.g.[BSI] and [NIST]) are compared to schemes that do not yet require testing [KATAKRI], or to weigh those schemes against latest research.

Reference:

BSI, Federal Office for Information Security: "Security of AI-Systems:Fundamentals - Adversarial Deep Learning" https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/KI/Security-of-AI-systems_fundamentals.html
NIST: "Adversarial Machine Learning A Taxonomy and Terminology of Attacks and Mitigations", https://doi.org/10.6028/NIST.AI.100-2e2023.ipd
Formin/Kansallinen turvallisuusviranomainen: "KATAKRI 2020: Tietoturvallisuuden auditointityökalu viranomaisille", https://um.fi/katakri-tietoturvallisuuden-auditointityokalu-viranomaisille

MK2: OT vs IT cybersecurity management

Tutor: Mikko Kiviharju (mikko.kiviharju@aalto.fi)

Managing cybersecurity of different operational technology (OT) systems (industrial control and automation and others) may be challenging due to differing priorities of safety and security. This study will view the recommended OT-specific issues via NIST guidance for OT [NIST1] and their cybersecurity framework (CRF) [NIST2].

Reference:

[NIST1] NIST: "NIST Special Publication SP 800-82r3: Guide to Operational Technology (OT) Security", https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r3.ipd.pdf
[NIST2] NIST. "Cybersecurity framework", https://www.nist.gov/cyberframework

MSA1: Effective and efficient knowledge distillation approaches

Tutor: Maryam Sabzevari (maryam.sabzevari@nokia-bell-labs.com)

Knowledge distillation is a technique in deep learning aimed at model compression, where a compact model, referred to as the student, is trained to replicate the behavior of a larger, more complex model known as the teacher. This research area focuses on refining knowledge distillation methodologies by exploring novel distillation architectures, multi-modal knowledge transfer, adaptive distillation strategies and self-distillation techniques.

Reference:

Huang, Tao, et al. "Knowledge distillation from a stronger teacher." Advances in Neural Information Processing Systems 35 (2022): 33716-33727.
Phuong, Mary, and Christoph Lampert. "Towards understanding knowledge distillation." International conference on machine learning. PMLR, 2019.
Cho, Jang Hyun, and Bharath Hariharan. "On the efficacy of knowledge distillation." Proceedings of the IEEE/CVF international conference on computer vision. 2019.

MSA2: Continual learning approaches and challenges

Tutor: Maryam Sabzevari (maryam.sabzevari@nokia-bell-labs.com)

Continual learning, also known as lifelong learning, is a critical research area in machine learning that addresses the challenge of enabling AI systems to learn and adapt continuously over time, much like how humans learn from new experiences. In continual learning, models strive to accumulate knowledge from various data sources or tasks sequentially while retaining the knowledge learned from previous experiences. This presents a unique set of challenges, including catastrophic forgetting, where new information can disrupt the performance on previously learned tasks. Researchers in this field explore methods for preserving and consolidating past knowledge, adapting to changing environments, and efficiently incorporating new information. Continual learning is vital for building AI systems that can evolve and improve over time, making it highly relevant in diverse applications.

Reference:

Guo, Yiduo, Bing Liu, and Dongyan Zhao. "Online continual learning through mutual information maximization." International Conference on Machine Learning. PMLR, 2022.
Arani, Elahe, Fahad Sarfraz, and Bahram Zonooz. "Learning fast, learning slow: A general continual learning method based on complementary learning system." arXiv preprint arXiv:2201.12604 (2022).
Wang, Liyuan, et al. "Memory replay with data compression for continual learning." arXiv preprint arXiv:2202.06592 (2022).

JM: CDN attacks

Tutor: Jose Luis Martin Navarro (jose.martinnavarro@aalto.fi)

Content Delivery Networks (CDNs) are an essential Internet infrastructure that improves the performance and scalability of content requests such as webpages and media. They work as a geographically distributed proxy platform, caching and forwarding content on a massive scale. Many CDN providers include Distributed Denial-of-Service (DDoS) protection as an additional feature of using their services, usually through tools such as Web application firewalls (WAF). Several studies have tried to develop a general model for CDN attacks [1][2] . However, the CDN landscape is not static. New variants of attacks emerge every year, and vendors fix and develop different countermeasures. In this project, we ask students to conduct a literature review on the latest DDoS attacks [3][4][5]and validate the results with an ethical approach. With this understanding, we aim to identify new potential threats for CDNs.

Reference:

M. Ghaznavi, E. Jalalpour, M. A. Salahuddin, R. Boutaba, D. Migault and S. Preda, "Content Delivery Network Security: A Survey," in *IEEE Communications Surveys & Tutorials*, vol. 23, no. 4, pp. 2166-2190, Fourthquarter 2021, doi: 10.1109/COMST.2021.3093492.
Sharafaldin, Iman, Arash Habibi Lashkari, Saqib Hakak, and Ali A. Ghorbani. "Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy." In *2019 International Carnahan Conference on Security Technology (ICCST)*, pp. 1-8. IEEE, 2019.
CDN Backfired: Amplification Attacks Based on HTTP Range Requests
Guo, Run, Jianjun Chen, Yihang Wang, Keran Mu, Baojun Liu, Xiang Li, Chao Zhang, Haixin Duan, and Jianping Wu. "Temporal {CDN-Convex} Lens: A {CDN-Assisted} Practical Pulsing {DDoS} Attack." In *32nd USENIX Security Symposium (USENIX Security 23)*, pp. 6185-6202. 2023
Li, Zihao, and Weizhi Meng. "Mind the amplification: cracking content delivery networks via DDoS attacks." In *Wireless Algorithms, Systems, and Applications: 16th International Conference, WASA 2021, Nanjing, China, June 25–27, 2021, Proceedings, Part II 16*, pp. 186-197. Springer International Publishing, 2021

MS: Security for future factories: overview of tools and technologies

Tutor: Mohit Sethi (mohit.sethi@aalto.fi)

Operational technology (OT) security for factories is currently receiving quite a lot of attention from the industry, as well as, from regulators. The student is expected to provide an overview of different tools and technologies such as unidirectional gateways and data diodes that are available for ensuring security of factories.

Reference:

https://csrc.nist.gov/pubs/sp/800/82/r3/ipd
https://www.ideals.illinois.edu/items/124965
https://www.opswat.com/products/unidirectional-security-gateway-guide
https://www.opswat.com/blog/how-we-test-unidirectional-gateway

HF: Prospects of WebAssembly to address performance and energy challenges of mobile clients

Tutor: Hannu Flinck (hannu.flinck@nokia.com)

WebAssembly (WASM) is a binary instruction format that allows execution of code in a safe and efficient manner across different platforms and programming languages. It was designed as a low-level virtual machine that runs code at near-native speed, making it ideal for performance-intensive applications. The motivation behind WASM is to enable developers to build web applications with high performance and portability, breaking the barriers imposed by traditional browser-based JavaScript execution. The run-time performance of WASM against that of JavaScript has been studied extensively. Recently the energy consumption of WASM versus JavaScript has also been addressed. One study found that using WASM may improve the run-time the battery consumption by 39% in comparison to JavaScript.

References:

W. Wang, "Empowering Web Applications with WebAssembly: Are We There Yet?," 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia, 2021, pp. 1301-1305, doi: 10.1109/ASE51524.2021.9678831.
J. De Macedo, R. Abreu, R. Pereira and J. Saraiva, "On the Runtime and Energy Performance of WebAssembly: Is WebAssembly superior to JavaScript yet?," 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), Melbourne, Australia, 2021, pp. 255-262, doi: 10.1109/ASEW52652.2021.00056.
J. De Macedo, R. Abreu, R. Pereira and J. Saraiva, "WebAssembly versus JavaScript: Energy and Runtime Performance," 2022 International Conference on ICT for Sustainability (ICT4S), Plovdiv, Bulgaria, 2022, pp. 24-34, doi: 10.1109/ICT4S55073.2022.00014.
Jangda, Abhinav, et al. "Not so fast: Analyzing the performance of {WebAssembly} vs. native code." 2019 USENIX Annual Technical Conference (USENIX ATC 19). 2019.
Kjorveziroski, V., Filiposka, S. WebAssembly as an Enabler for Next Generation Serverless Computing. J Grid Computing 21, 34 (2023). https://doi.org/10.1007/s10723-023-09669-8

JB1: Software supply chain security: the SLSA specification

Tutor: Jacopo Bufalino (jacopo.bufalino@aalto.fi)

Vulnerable code can leak into a production system in every step of the software supply chain. The SLSA specification has been proposed to address the key supply chain threats by verifying the source, build and dependency integrity. SLSA provides three different levels of supply chain security guarantees, depending on the level of assurance needed for a given context. For this topic, you have to build a project that is SLSA level 3 compliant and present its related security guarantees.

Reference:

https://slsa.dev/ https://dl.acm.org/doi/abs/10.1145/3372297.3420015
https://arxiv.org/abs/2002.01139
https://ieeexplore.ieee.org/abstract/document/9740718
https://owasp.org/www-project-software-component-verification-standard/

JB2: Testing network policies with eBPF

Tutor: Jacopo Bufalino (jacopo.bufalino@aalto.fi)

eBPF is an in-kernel Virtual Machine, designed to run sandboxed programs within the OS kernel. Its usage extends across a wide array of applications in both server environments and the cloud. eBPF applications include resource observability, network filtering and monitoring, and network security. In this seminar topic, the student will survey the existing eBPF applications in the field of network security and create an eBPF program to test the firewall policies of a Linux host.

Reference:

https://ebpf.io/ https://cilium.io/
https://github.com/lizrice/ebpf-beginners
https://dl.acm.org/doi/abs/10.1145/3371038
https://ieeexplore.ieee.org/abstract/document/9335808

MSI: Neural radiance fields (NeRF)

Tutor: Matti Siekkinen (matti.siekkinen@aalto.fi)

NeRF provides means to synthesize novel views using a neural network based model that has been trained to learn a specific 3D scene from a set of images. NeRF basically enables 6DoF navigation within the learned scene and has applications, for instance, in XR. Much improvements have been made by the research community since the idea was first introduced a few years ago. The student's task is to survey the state of the art and identify still open problems.

Reference:

https://www.matthewtancik.com/nerf
https://paperswithcode.com/method/nerf

Last modified: Tuesday, 26 September 2023, 7:28 PM

Available topics

SS1 : Privacy and Dark Patterns

Tutor: Sanna Suoranta (sanna.suoranta@aalto.fi)

SS2: What users really think while trying to use something that requires security?

Tutor: Sanna Suoranta (sanna.suoranta@aalto.fi)

SS3: Cookie consent forms

Tutor: Sanna Suoranta (sanna.suoranta@aalto.fi)

XW:Model based Learning to Optimize (L2O): Applications Survey

Tutor: Xinjue Wang (xinjue.wang@aalto.fi)

BL1: Adversarial perturbation attacks

Tutor: Blerta Lindqvist (blerta.lindqvist@aalto.fi)

BL2: Adversarial defenses against adversarial perturbation attacks

Tutor: Blerta Lindqvist (blerta.lindqvist@aalto.fi)

SR: Microservices - when and how to use them

Tutor: Sara Ranjbaran (sara.ranjbaran@aalto.fi)

RM: Understanding prompt generator for AI

Tutor: Rongjun Ma (rongjun.ma@aalto.fi)

AZ1: Generating a realistic visual crowdsourced dataset using Carla simulator for autonomous driving

Tutor: Aziza Zhanabatyrova (zhanabatyrova.aziza@aalto.fi)

AZ2: Survey on robust camera pose estimation methods

Tutor: Aziza Zhanabatyrova (zhanabatyrova.aziza@aalto.fi)

MG: Enhancing the usability of the mobile health apps (or Websites) designed for older adults

Tutor: Maedeh Ghorbanian Zolbin (maedeh.ghorbanianzolbin@aalto.fi)

ES1: Calibration in quantized LLMs

Tutor: Erik Schultheis (erik.schultheis@aalto.fi)

ES2: Adaptive repeat penalties for LLMs

Tutor: Erik Schultheis (erik.schultheis@aalto.fi)

VH: Distributed Artificial Intelligence for IIoT

Tutor: Vesa Hirvisalo (vesa.hirvisalo@aalto.fi)

DH: Amortized Meta Bayesian Optimization

Tutor: Daolang Huang (daolang.huang@aalto.fi)

KH: Large language models for modeling of physical systems

Tutor: Katsiaryna Haitsiukevich (katsiaryna.haitsiukevich@aalto.fi)

AJ1: Where is the Beef ? The Optimal Selection of Training Data.

Tutor: Alexander Jung (alex.jung@aalto.fi)

AJ2: Trade offs between Explainability and Accuracy of Machine Learning.

Tutor: Alexander Jung (alex.jung@aalto.fi)

MK1: Certification of machine learning robustness

Tutor: Mikko Kiviharju (mikko.kiviharju@aalto.fi)

MK2: OT vs IT cybersecurity management

Tutor: Mikko Kiviharju (mikko.kiviharju@aalto.fi)

MSA1: Effective and efficient knowledge distillation approaches

Tutor: Maryam Sabzevari (maryam.sabzevari@nokia-bell-labs.com)

MSA2: Continual learning approaches and challenges

Tutor: Maryam Sabzevari (maryam.sabzevari@nokia-bell-labs.com)

JM: CDN attacks

Tutor: Jose Luis Martin Navarro (jose.martinnavarro@aalto.fi)

MS: Security for future factories: overview of tools and technologies

Tutor: Mohit Sethi (mohit.sethi@aalto.fi)

HF: Prospects of WebAssembly to address performance and energy challenges of mobile clients

Tutor: Hannu Flinck (hannu.flinck@nokia.com)

JB1: Software supply chain security: the SLSA specification

Tutor: Jacopo Bufalino (jacopo.bufalino@aalto.fi)

JB2: Testing network policies with eBPF

Tutor: Jacopo Bufalino (jacopo.bufalino@aalto.fi)

MSI: Neural radiance fields (NeRF)

Tutor: Matti Siekkinen (matti.siekkinen@aalto.fi)

Students

Teachers

About service