Abstraksi
Microblogs such as Twitter provide a platform for users to post their short status messages, known as tweets, to express opinions on diverse topics. Twitter could be used in different ways, such as posting thoughts about personal daily life events, current political issues, to keep updated with news events, and to find jobs. The interests on using microblogs are due to their contextual immediacy, providing here and now access to users, including opportunities to geotag tweets from the location of the posts. Indonesia is currently facing rapid development of information, communication, investment, transportation, trading, and science. Also, current generations have multicultural characteristics, confident, high level of volunteerism, intelligent in technology, and developing along which technological developments that make it easier for them to communicate in social networks. Nowadays, career interest and decision making require a longer process, not a mere momentarily desire. The process began from primary education as students and must be based on their ability and understanding. Different perceptions and ability of students on the selection of their careers are inevitable. Also, based on gender, there are differences in the perception of students in choosing careers. Research Objective In this research, we want to understand and observe twitter stream on how the interest of students in particular field of expertise will have a correlation with their decision in choosing a career and whether there are differences in perceptions of students in decision making by gender. Methodology In this study, we propose to use factorisation machines (FM) which deal with multiple aspects of the dataset. The model can easily predict user decisions while modelling user interests through content at the same time. Factorisation machines is a method introduced by Steffen Rendle using the advantages of Support Vector Machines (SVM) combined with factorisation models. FM is a predictor that work with any real-valued feature vector, could be optimised in the primal and do not rely on support vectors (linear complexity), and model all interactions between variables using factorised parameters. Thus, FM allows parameter interactions estimation even in high sparsity data. We apply factorisation machines to twitter stream data with its constraints. The constraints, called features, were extracted from the data to be used in our models. These features were generated from the content of user’s tweets trying to capture its interests. For content modelling, we use user id, tweet id, length of tweets, word count, word history, keyword count, keyword history, content relevance and keyword relevance as features. For user decisions modelling, we use target user id and tweet id as features. Thus, the resulting method could mimic the robust topic models and yet benefit from the efficiency of a simpler form of modelling. For user decision modelling, we compare a number of ranking-based loss functions. We proposed weighted approximately ranked pairwise (WARP) loss, that introduced in (Usunier et al., 2009), which has been successfully applied in text and image retrieval tasks. We apply our proposed methods to the problem of modelling personal decision making in twitter and explore a range of features, revealing which type of features contribute to the predictive modelling and how content features help with the prediction. Results/findings Job interest and decision inferences are made on unseen data. Further, we deliver a set of 31 time instances, every day during March, which do not overlap with the instances used for learning the calibration weights, and we first compute the scores for each keyword per time instance. Then those scores are multiplied by the corresponding calibration weight. In our experiment, we retrieve a statistical significance indication for our inferences. In doing so, we consider more than 1900 keywords regarding job interests and positions, randomly permute the outcomes and come up with a randomised training and test data, and count how many times a model that is based on randomly permuted data delivers a better inference performance than actual, which this fraction give result on the p-value. We determined that p-value lower than 0.05 indicates statistical significance.