Project Topics

If you are interested in any of the following projects, please email Dr Patrick Pang (see the homepage for my email address) to discuss further. Please attach your recent academic results in your email.

Deriving Topics from Twitter/Weibo/Forum Posts

Master Project

In this project you will identify topics hidden in a social media or community forum dataset. You will have to do some research on the latest existing topic modelling or classification algorithms that are applicable on such text. Then, perform necessary data pre-processing and experiment both supervised and unsupervised algorithms on the dataset. This project is good for students who have some knowledge in Machine Learning and Natural Language Processing.
Classifying Weibo Tweets based on People’s Health Information Needs

Master Project

This will be a software development project for building a research tool. You would need to build a software to classify Chinese Weibo tweets into different categories. Firstly, you need to find out the health-related tweets from all tweets (about 10 millions). Then you have to put them into different categories (e.g. health facts, news, advertisements, and questions). A bit of knowledge about NLP and classification algorithms is needed. Also you need to be capable to read Chinese to work on this project.
Touch-based Book Exploration App

Master Project (Software Development)

You will need to build an app for lay-users to browse and explore books in a touch-based environment. This project consists of two parts. First of all, you will need to design a touch-based user interface (UI) for browsing books. We have data about books and would display related books adjacent to each other, but we haven't known yet the best UI for this. Next, you will need to implement this design in a touch-based environment. You can choose either implement this as a web app (e.g. HTML+CSS+JavaScript), or an iPad app.
Visualisation of Health-related Tweets

Master Project

A picture is worth a thousand words. In this project, you will need to identify appropriate algorithms to visualise a dataset of health-related tweets, so that researchers can easily understand the content and the relationships among different types of data. Particularly, you will need to identify the characteristics of the dataset and find the most suitable methods to present the data in the visual way.