Advertisement
Friday, Oct 22, 2021
Outlook.com
Outlook.com

IIT Mandi proposes sampling techniques for accurate insights in real world high dimensional datasets

IIT Mandi proposes sampling techniques for accurate insights in real world high dimensional datasets
outlookindia.com
1970-01-01T05:30:00+0530
New Delhi, Jan 25 (PTI) A researcher at the Indian Institute of Technology (IIT) Mandi has proposed sampling techniques to accurately provide insight into the real world high dimensional datasets.

Rameshwar Pratap, Assistant Professor at IIT Mandi in collaboration with Microsoft Research India, Bengaluru, and Carnegie Mellon University, Pittsburgh, USA, has proposed simple, efficient, and accurate sampling techniques to provide insight in the real world high dimensional datasets.

According to Pratap, recent technological advancements in the world have generated a large volume of high dimensional datasets from various sources such as Internet of Things (IoT), World Wide Web, bioinformatics, finance, social network, smart home appliances, smart cities and 5G communication media, among others.

"These high dimensional datasets need to be carefully analysed to infer interesting insights that can be useful for making important decisions," Pratap said. "Typically several algorithmic techniques such as clustering, regression, and classification are used to analyse Big data. However one of the major challenges in the real-world datasets is that they consist of outliers or anomalies which potentially can confuse these algorithms, and consequently can lead to incorrect insights," he said.

To address these challenges, the researchers have come up with simple techniques for two fundamental unsupervised learning tasks -- Clustering and Dimensionality Reduction.

"As both clustering and principal component analysis are fundamental subroutines in many artificial intelligence applications such as text, audio, video and image compression, building scalable recommendation systems, faster duplicate detection, scalable indexing for faster search, and many more, our results can potentially get accurate and scalable solutions in all of these applications, even when the data is noisy," he said.

In clustering, the research has focused on a famous clustering algorithm ''k-means'' clustering. "In this clustering, the aim is to group the data points into k number of clusters such that points belonging to a particular cluster are more closer to its cluster centre than the remaining. Finding the optimal clustering is hard," he said.

Sharing details of his research, Pratap said in order to address this challenge efficient sampling algorithms have been proposed so that output is close to the optimal solution -- approximate representative of each cluster centre.

"However, the presence of outliers can confuse the sampling algorithm that in turn may output a solution which is very far from the optimal. To address this, researchers have proposed a sampling algorithm which can efficiently find a close to optimal clustering solution even when outliers are present in the datasets.

"As the presence of outliers can confuse these sampling algorithms and the resulting solution can be significantly worse, we have proposed efficient and accurate sampling algorithms which find close to optimal principal components even when outliers are present in the datasets," he said. PTI GJS GJS RDM

RDM


Disclaimer :- This story has not been edited by Outlook staff and is auto-generated from news agency feeds. Source: PTI

More from Website

Sensex Jumps Over 200 Points In Early Trade, Nifty above 18,240. HDFC Top Gainer, Followed By Titan, PowerGrid, Bajaj Auto

Sensex Jumps Over 200 Points In Early Trade, Nifty above 18,240. HDFC Top Gainer, Followed By Titan, PowerGrid, Bajaj Auto

The 30-share index was trading 207.09 points or 0.34 per cent higher at 61,130.59 in initial deals. Similarly, the Nifty advanced 65.65 points or 0.36 per cent to 18,243.75.

Prop Gun Fired By Alec Baldwin On ‘Rust’ Sets Kills Cameraperson And Critically Injures Director

Prop Gun Fired By Alec Baldwin On ‘Rust’ Sets Kills Cameraperson And Critically Injures Director

US actor Alec Baldwin fatally shot cameraperson Halyna Hutchins and director Joel Souza on the set of ‘Rust’.

Petrol, Diesel Prices Hiked Third Straight Day. Check What Fuel Costs In Your City?

Petrol, Diesel Prices Hiked Third Straight Day. Check What Fuel Costs In Your City?

Petrol price in Delhi today has gone up to Rs 106.89 per litre, while diesel price today has inched closer to three figures at Rs 95.62 per litre in the national capital.

More from Outlook Magazine

An Unfolding Dalit Movement In Tamil Nadu Has Escaped Nation’s Attention

An Unfolding Dalit Movement In Tamil Nadu Has Escaped Nation’s Attention

Poet and activist Meena Kandasamy visits ground zero of the grassroots movement down South to understand the caste divide in the state.

India Needs A Caste Census To Know Where The Marginalised Stand In The Development Index

India Needs A Caste Census To Know Where The Marginalised Stand In The Development Index

The aim of a caste census is not to know the numerical strength of a caste. Rather, it would reveal social inequalities for possible remedial measures.

Lakhimpur Kheri: Heartbreak And Tears As Families Try To Cope With Personal Grief

Lakhimpur Kheri: Heartbreak And Tears As Families Try To Cope With Personal Grief

Kin of killed farmers, BJP members have the same demand—justice for the dead.

Advertisement

Outlook Newsletters

Advertisement
Advertisement