Finding the Next Social Media Star


Data scraping of social media enables the extraction of valuable information for diverse use cases, such as identifying emerging trends, analyzing public sentiment, monitoring competitors, pinpointing trending influencers, and gaining deep customer insights. It aids businesses in tailoring marketing strategies, enhancing content engagement, conducting competitive analysis, and fostering targeted product development. The process provides a dynamic overview of the social media landscape, allowing for informed decision-making and strategic alignment with consumer behavior and preferences.


A startup in the entertainment industry came to Atlas looking to source new leads for their company.

The startup had ambitious plans, hoping to find rising stars before they even realized they were. To reach these lofty goals, we needed to create a plan both for how to identify and predict which content creators would have the potential for viral growth. Data scraping emerged as the bedrock foundation for identification, and machine learning served as an important layer to help hone in on the content creators who had the most potential for growth.

Solution Overview

Understanding Client Needs

Our client focused on a subset of all content genres which was most in tune with their industry and areas of expertise. As such, when pushing forward with the large-scale data scrape, we were able to hone in on the portions that were most relevant to our client’s needs. In addition, other pieces of information like a content creator's follower/following count, video count, and time on the platform influenced strategies with regard to what to focus on, and what to filter.

Data Scraping Layer

With a better understanding of the project scope, we began our data collection process.  Customized AtlasBots were developed to extract data from this social media platform, focusing on content creators' profiles, genres, videos, hashtags, likes, shares, comments, followers, and more. Along with this, a PSQL database was set up to record and update the content video growth, tracking changes over time, and enabling a historical view of the content creator's evolution.

Data Engineering Layer

A robust database structure of both S3 and RDS resources was provisioned to help with storing the scraped data. This allowed for quick retrieval and in-depth analysis. Many layers of data cleaning and filtering were applied in order to refine the raw Internet dataset to morph into a more digestible format. Along with this, continuous quality checks were implemented to ensure the accuracy and completeness of the data stored.

Machine Learning Layer

A crucial component of the project was the application of machine learning for forecasting the success of various content and identifying the rising stars among content creators. A sophisticated predictive algorithm was trained on the historical data of video count growth. This allowed the model to gauge the potential trajectory of new and existing content, and to pinpoint creators who were on the brink of becoming major influencers in the industry. The forecasts were continuously updated and refined as new data was integrated, providing a dynamic and accurate view of the ever-changing social media landscape. By identifying these rising stars early, businesses and producers are able to better make strategic decisions, tapping into emerging talents and trends before they reached the broader market, thus gaining a competitive edge. The integration of machine learning into this process signified a leap forward in predictive analytics, bringing data-driven insights to the forefront of decision-making within the music and entertainment sectors.

Data Visualization Layer

To make the insights and predictions accessible and actionable, a custom-built front-end website tool was developed. This platform allowed our client to interact with the vast data sets and the outputs of the machine learning model. Graphical representations were employed to illustrate the historical growth trends of various content creators, as well as predictive analytics regarding future trends. Dynamic dashboards, charts, and visual aids were integrated to allow easy navigation and interpretation of complex data. This visual interface became an essential tool for stakeholders to understand, explore, and leverage the information uncovered through data scraping and machine learning, enabling them to make data-driven decisions and strategies.

Report Generation

Recognizing the need for versatile access to the data, custom PSQL scripts were incorporated into the software that allowed for easy exportation of data into CSV format. With a simple query, users could generate comprehensive reports outlining the hot content creators, their growth trends, engagement metrics, and other relevant information. These CSV files facilitated a seamless integration with various reporting tools, enabling stakeholders to create detailed visual reports and presentations. Whether for internal analyses or external collaborations, this feature ensured that the insights drawn from the social media platform were readily available and adaptable to various business needs and objectives.

Conclusion & Next Steps

The case of data scraping to identify different trending content creators demonstrates a strategic approach to leveraging big data and machine learning. It showcases how technology can be utilized to gain actionable insights in the ever-evolving social media landscape, ultimately informing strategies and uncovering new opportunities. Several emerging and trending creators were spotlighted, providing valuable leads for various industry stakeholders. Machine learning-powered forecasting helped in pinpointing the next big hits and talents, allowing for timely strategic decisions - in some cases identifying content creators before their content experienced over 100x growth! The insights derived from this project enabled various stakeholders to make better decisions.