Multimedia Big Data Analytics from HKUST-NIE Social Media Lab


Billions of user shared images in social media are seamlessly generated everyday, and are widely accessible by others due to its intrinsic nature for sharing. Using user shared images is proved in this work as an effective alternative to improve the service relevance for users by discovering user's connections, such as friendships, and communities on social media platforms. This work has investigated over 4 million images from 9 social networks. A phenomenon is observed across these social networks that connected users and users in the same community tend to share visually similar images, regardless of the origins and content sharing mechanisms of the social networks, and applications are proposed accordingly.
Connection discovery [C1, C2, C3, J1, S2]. User connections, or social graph, can be in any form of online social relationships, such as online friendships, follower/followee relationships and community memberships. The aim of the discoveries is to enhance service relevance from discovered user connections and the community, and is proven to be 4 times better that by chance.
Gender identification [S3]. Many social media applications, such as recommendation, virality prediction and marketing, make use of user gender, which may not be explicitly specified or kept privately. A multimedia big data system that utilize this observed phenomenon is proposed for user gender identification, which can achieve 74% accuracy.


Data Collection


The images are collected from a set of randomly selected users from each social network by either the official API (Twitter, Skyrock, 163 Weibo and Tencent Weibo), or by a ruby-based web scraper (Flickr, Pinterest, Duitang and Digu). The details of those images are shown in the table. As the number of maximum requests (the number of images to be downloads) is limited by the operators of the social networks, the user generated images are collected in parallel with virtual machines installed on Cloud platforms such as Amazon EC2.

origins social networks user shared images feature file size (GB) Image file size (GB) min/mean/median/max size of images (KB)
US Flickr 201,006 80.7 35.3 2.1/193.8/178/1960
US Pinterest 314,369 133 23.5 0.671/77.1/51.4/1020
US Twitter 150,696 49.6 8.07 0.338/58.5/41.3/1340
France Skyrock 176,547 39.7 12.6 0.0684/75.4/37.1/9020
China Duitang 396,534 134 43.9 0.947/116/68.2/6910
China Digu 148,337 92.9 25 0.181/179/86.0/14400
China 163 Weibo 187,491 58.1 16.7 0.0791/102/56.4/7500
China Tencent Weibo 490,624 98 17.3 0.616/37.2/26.3/963
Total 2075604 686 182

Data Processing

The scraped user generated images are processed with a Matlab-based program. The program applies the Parallel Computing Toolbox to reduce the long processing time of BoF Tagging by utilize all cores and memory available on the machines. Once the BoFT labels, the labels assigned by BoF Tagging on each image, are obtained, the user profile is calculated by counting the occurrence of labels of each user.

Data Analytics

The user profile are used in data analytic to calculate the similarity among users. Once the similarity among users is calculated, connection and community discoveries are available. This part is processed by a Matlab-based program, with approaches such as hierarchical clustering for community detection. The result can be applied in many aspect such as recommendations and virality prediction.


  1. J1.M. Cheung; J. She.; Z. Jie., "Connection Discovery Using Big Data of User-Shared Images in Social Media," in IEEE Transactions on Multimedia, vol.17, no.9, pp.1417-1428, Sept. 2015

  2. V1.M. Cheung and J. She, "Discovering User Connections using Their Shared Images on Social Media",  the 12th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC 2015), Aug. 2015. video

  3. C3.M. Cheung, J. She and Xianpeng Li, "Non-user Generated Annotation on User Shared Images for Connection Discovery” The IEEE International Conference on Cyber, Physical and Social Computing Dec, 2015. (Accepted)

  4. C2.Z. Jie, M. Cheung and J. She, "A Cloud-assisted Framework for Bag-of-Features Tagging in Social Networks", IEEE 4th Symposium on Network Cloud Computing and Applications (NCCA), Jun. 2015.

  5. C1.M. Cheung and J. She. "Bag-of-Features Tagging Approach for a Better Recommendation with Social Big Data." IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management. 2014. (Best Paper Award)

    1. S1.M. Cheung and J. She, "Understanding User Connections through their Shared Images in Social Media" (submitted)

    2. S2.M. Cheung, J. She and Z. Jie, "Discovery User Connections and Communities from User Shared Images in Social Media" (submitted)

    3. S3.Z. Jie, M. Cheung and J. She, "An Efficient Computation Framework for Connection Discovery using Shared Images" (submitted)

    4. S4.M. Cheung and J. She, "An Analytic System for User Gender Identification Through User Shared Images" (submitted)