Type: Response Essays
Sample donated: Dawn Hicks
Last updated: July 18, 2019
Abstract— Nowadays YouTube programs attain more publicity andpeople are much addicted on these programs. It not only attracts audience butalso many advertisers, so in order to help them find the most popular programand the channel name we use a new method. The accurate and sensible forecastabout a program’s popularity provides great value for people like contentproviders, advertisers, and broadcast TV operators etc. This information can beuseful to advertisers to make profitable investment plans. There are quite alot of prediction models that are commonly used to predict program popularity.
But these methods require abundant samples, extensive training and has poor predictionaccuracy. An improved predictionapproach is proposed and it uses the K-medoids algorithmfor clustering the data into four trends and then it is given as input togradient boosting decision tree and in extreme gradient boosting algorithm.Finally by using all these data we check which one gives better predictionresults.Keywords—YouTube, predicting names, k-medoids algorithm, random forest regression. I. Introduction As the fame of newtechnologies like 3D technology increase, people get more addicted to internetvideos, so it attracted all the broadcast TV channels to publish their programsin channels like YouTube etc. It is now becoming an emerging trend to telecastTV programs in internet to increase their popularity.
According to the modern explorationsthe internet streaming of broadcast TV programs will continue to grow at arapid pace. All the programs do not get equal response. Only a few programs cangain enormous user attention the remaining programs are left without anybody towatch them. In this perspective, it isof great importance to forecast the popularity of these programs. Using theprogram popularity prediction results, the audience will save much time whentrying to discover valuable programs among massive collections of videoresources, which will improve user satisfaction.
Based on program popularitydata, a company will be able to maximize its marketing effect by choosing theprograms with highest potential. However, accuratelypredicting the popularity of broadcast TV programs, quality of the program andthe interests of the audience is a difficult task. Last, there is a massive gapbetween the popularity evolutionary trends of different programs, which shouldbe considered when designing the prediction model 5.An enhanced method topredict the program popularity among YouTube programs has been proposed in thispaper. The main aspects of our work on popularity prediction are as follows:First, we use K-medoidsalgorithm to cluster programs with similar popularity into 4 evolutionary trends.This approach provides more efficient outcomes than the previous methods thatwere used to delineate popularity evolutionary trends 5. Secondly, we put uptrend-specific prediction models using gradient boosting algorithm and inextreme gradient boosting algorithm and find out which one achieves higheroverall predictive performance.
Fig-1.Flow of methodology II. LITERATURE REVIEWThe program popularity predictionbegan with the news articles. It formed a new way for online contentprediction. It also introduced new methods to predict news comment volume andpopularity of news articles such as those discussed in 1.Another methodsimilar to the previous one is discussed in 2.It uses a long linear model topredict the data.3 observed a Poisson method can depict the popularity gainedby videos followed three popularity evolutionary trends.
4 Used YouTube datato forecast popularity of web content based on chronological information givenby early popularity measures.5 Used a new k-medoids algorithm along withrandom forest regression to predict popularity content of a broadcast TVchannel it predicts the name of the program which is popular. All the previousmethods used focuses only on the general model to predict the popularity of aprogram and are ineffective to predict popularity among broadcast channels, 5is the only new method to predict popularity among broadcast TV channelprograms. III. METHODOLOGYThemethodology includes three different algorithms. First k-medoids algorithmfinds the new evolutionary trends.
For program popularity there are differenttypes of propagation trends. Each one has different level of features. We couldget more efficient data if they are propagated. So in order to propagate thosek-medoids is used as a replacement for of k-means clustering. It is said thatif the cluster groups are more than four it will not provide accurateprediction model. So this paper uses four clusters and is used in Gradientboosting algorithm to find the predictions.
Finally Extremegradient boosting algorithm is used with the same input used in gradientboosting to predict the popularity of programs and check which one providesbetter predictions. A.TRENDDETECTION Thissection describes the k-medoids clustering of program popularity into fourtrends 5. In k-means clustering the center of the cluster represents the meanof the members in the cluster. Whereas, in k-medoids the center of the clustersare the medians of the cluster members. The k-medoids thus gives efficientresults than k-means. The other steps of k-medoids algorithm are the similar tothat of k-means.
The clusters gained are more important and are used indifferent algorithms to predict the popularity. B.TRENDSPECIFIC PREDICTION Thissection describes about the usage of Gradient boosting algorithm for providingtrend specific prediction models. It produces more accurate values than theother prediction algorithms.
The decision trees are perceptive to the data onwhich they are trained 5.The other algorithms have high structuralsimilarities but in Gradient boosting the trees are unique. It is specifiedthat stable results for estimating variable importance are achieved with ahigher value 5. C.CLASSIFICATIONOF PROGRAM’S POPULARITY USING EXTREME GRADIENT BOOSTING Thissection discusses about extreme gradient boosting algorithm. This algorithm isbuilt based on the principles of gradient boosting. The difference betweengradient boosting and extreme gradient boosting is that it produces a regularized model formulation to manageover-fitting, which produces a better performance. A solitary decision tree can have over fitting which is overcome bygradient boosting algorithm by combining hundreds of trees each containing someleaf nodes 5.
The extreme gradient boosting model gives better forecast presentationwhen compared with other models and it also has a great speed. It is ten timesfaster than other algorithms. The decision trees are built to predict newpopularity trends.
Thus it produces an efficient result on predictingpopularity among YouTube videos. IV. EXPERIMENTSA.
DATASET Thedata used in this paper is YouTube trending videos dataset. It includes severalfeatures like title of the program, channel name, views, likes, dislikes etc.The summary of dataset is given below in table1. Views Likes Dislikes Min 1141 0 0 Median 357858 8774 276 Mean 1109764 41621 2073 Max 66637636 2542863 504340 Table1: Summary of dataset B.
DISCUSSION We use R-studio to implement the required k-medoidsclustering, gradient boosting and extreme gradient boosting algorithms in thisstudy.First the k-medoids algorithm is used to split the data into 4 clusters. Fig-2.k-medoids clusters The clusters gained fromthe above algorithm are used in Gradient boosting and the predictions are madeaccording to these clusters.Then the clusters are usedfor Extreme gradient boosting model and predict the popularity for all theprograms.
Compared with gradient boosting algorithm the extreme gradientboosting algorithm gives better results. V.CONCLUSION In this paper we have predicted thepopularity for programs according to their publish time. We used K-Medoidsalgorithm to cluster programs into 4 trends, which has the capability to detainthe program popularity. Furthermore, Gradient boosting is used to forecastpopularity. Then Extreme gradient boosting is used to predict the results. Itgives more accurate prediction results than the generally used gradientboosting algorithm.
The experimental results give gain in accuracy than themethods used previously to forecast program popularity among YouTube videos. Itgives an unswerving prediction outcome much faster. References 1 M. Tsagkias, W.Weerkamp, and M. de Rijke, “News comments: Exploring, modeling, and onlineprediction,” in Advances in Information Retrieval. Cham,Switzerland: Springer, 2010, pp.
191_203.2 2 G. Szabo and B. A.
Huberman, “Predicting the popularity of online content,” Commun. ACM,vol. 53, no. 8, pp. 80_88, 2010.3 3 R.
Crane and D.Sornette, “Robust dynamic classes revealed by measuring the response functionof a social system,” Proc. Nat.
Acad. Sci. USA,vol. 105, no. 41, pp.
15649_15653, 2008.4 4 H. Pinto, J. M.Almeida, and M. A. Gonç_alves, “Using early view patterns to predict thepopularity of YouTube videos,” in Proc.
6th ACM Int. Conf.Web Search DataMining, 2013, pp.
365_374.5 5 Chengang zhu, Guangcheng, (Senior Member, IEEE), and kun wang 2,3, (Senior Member, IEEE)” Big DataAnalytics for Program Popularity Prediction in Broadcast TVIndustries”,IEEE,2017.