Machine learning models identify applications that

Machine learning models identify applications that

image: This is Fadi Mohsen, Assistant Professor in the Information Systems Group at the Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen. He is the first author of a paper on the development of the first data-driven models that predict the stability of mobile applications.
see After

Credit: University of Groningen

A considerable percentage of new apps in the Google App Store are removed for violating store guidelines. This is inconvenient for users of these apps, who may lose their in-app data. Computer scientists at the University of Groningen have developed two machine learning models that can predict the risk of a new app being removed, both before and after it is uploaded to the App Store. These patterns can help both developers and users. The details of this project are described in an article published in the journal Systems and Soft Computing September 29.

The Google Play store has set rules and requirements that developers must adhere to. After being submitted, apps are immediately uploaded to the store, but Google takes some time to review them before removing apps that violate the guidelines. Developers whose apps have been removed more than once may be banned from the store.


“My research interests are in issues of digital privacy and security,” says Fadi Mohsen, assistant professor in the information systems group at the university’s Bernoulli Institute of Mathematics, Computer Science and Artificial Intelligence. from Groningen. Considering the consequences of removing apps for developers and users, he wanted to create a system that could predict whether new apps will be removed or not.

“There have been attempts to do this before, but these usually focus on specific types of apps that were removed for specific reasons, such as containing malware,” says Mohsen. “We wanted to develop a general model that predicts the chances of an app being removed, regardless of the type of app or the reason for removal.” Additionally, previous attempts focused only on users, while Mohsen also wants to help developers who have just accidentally broken the guidelines.

Source code

The first step was to gather a large set of data from deleted apps and unremoved apps: “We collected metadata, including descriptions provided by developers to the store, from approximately two million ‘applications. After that, we downloaded the source code of half of these applications.’ Afterward, Mohsen and his colleagues tracked the status of those apps in the store for six months to see which apps were removed.”In our selection, that was the case for 56% of them.” took them 26 months to finalize the dataset used to generate the machine learning models.


The algorithm they used is called Extreme Gradient Boosting. “It’s the best machine learning algorithm for these kinds of problems,” says Mohsen. The algorithm was used to create two predictive models: one for developers and one for users. The pattern for users was determined by 47 features, and in a test dataset, it predicted removal of a given app with 79.2% accuracy. Since some of these features, such as app store ratings, are not available before submitting the app to the store, the developer model was based on only 37 features, so its accuracy was slightly lower: 76.2 %.

“We can now predict the future of an app with reasonable accuracy,” Mohsen says. The next step is to develop an interface with which developers and users can rate apps on their removal risk. “This is valuable for developers, as they could be banned from Google’s App Store if they violate the guidelines repeatedly,” Mohsen says, “but also for users, as they generate data with their apps, which they will lose if the application is suddenly removed.


Other researchers will also benefit from this research: “The rich dataset we generated for our paper has been made publicly available via the Dutch repository” This means that anyone can try to improve on the results obtained by Mohsen and his colleagues. “We are looking forward to the competition, to see if they can beat us. This would further increase the benefits for users and developers.

Reference: Fadi Mohsen, Dimka Karastoyanova and George Azzopardi: Early detection of infringing mobile apps: a data-driven predictive model approach. Systems and Soft Computing, September 29, 2022.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of press releases posted on EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Similar Posts

Leave a Reply

Your email address will not be published.