image credits: Christopher Burns @ Unsplash

Real-world problems

In many real-world problems we have to process and analyse large amounts of text data. eg: Text Mining, Spam filtering, Product Recommendation, Online Advertising, etc..

Such data is usually characterised by high dimensionality For eg: The Google news model is trained on a dataset of about 100 billion words. The model contains 300-dimensional vectors for 3 million words and phrases.

In some cases we are analysing troves of documents to identify similar items in high dimension data for eg.:

  • Document classification
  • Plagiarism
  • Chatbots

A common issues across all such problems is to find near neighbours…


Today, while most industries are rapidly taking up Social Media to listen to and engage with customers, the pharmaceutical industry has remained cautious in its social endeavours. Owing to the lack of clarity on social media guidelines pharmaceutical firms balk at extending social presence in a highly regulated environment.

The Digital Health Monitor report analysed the online presence of 25 pharmaceutical firms, that included the names such as Pfizer, GSK and Bayer. Online presences of these firms was recorded across branded websites, blogs and apps, as well as on social media channels such as Facebook, Twitter and LinkedIn


The Black Square is an iconic painting by Russian artist Kazimir Malevich. The first version was done in 1915. The Black Square continues to impress art historians even today, however it did not impress the then Soviet government and was kept in such poor conditions that it suffered significant cracking and decay.

Complex machine learning algorithms can be mathematical work of art, but if these black box algorithms fail to impress and build trust with the users, They might be ignored like Malevich’s black square.


Drug discovery is a long, challenging and often futile process for pharmaceutical companies, it’s estimated two-thirds of all clinical trials to find new medicines ultimately fail.


Image credits Drew Beamer at unsplash

“When you come to the end of your rope, tie a knot and hang on.”

-Franklin D. Roosevelt, 32nd US president who led the nation through the great depression

The Coronavirus outbreak has rendered the 2020 and 2021 revenue and sales forecasts out dated. All industries have seen their top line plummet except a few such as critical goods, groceries and online collaboration tools. Businesses will need to realign their sales & marketing strategies and reevaluate their sales forecasts as they navigate this downturn.

The old adage of “cash flow is king” couldn’t be more relevant in this economic downturn…


For analyzing large spatial data sets, we need to partition geographic areas into identifiable grid cells. For example, to divide a large region into smaller units for indexing purposes or slice the geographic area into subunits over which we want to summarize a spatial variable.

Such grids are usually comprised of either equilateral triangles, squares, or hexagons, as these three polygon shapes are the only ones that can tessellate i.e. cover an area by repeated use of a single shape, without gaps or overlapping


Remdesivir, the first drug shown to be effective against the coronavirus, will be sold for $520 per vial, or $3,120 per treatment course. Drug pricing is a complex equation balancing R&D expenditure, marketing, manufacturing and distribution costs while balancing benefit delivered to patients and society.

Increasing healthcare costs and drug prices at the individual patient level and the overall economic level have been a concern for governments, insurers and payers all over the world.

In the UK, the leader of the opposition party, Jeremy Corbyn has vowed to redesign the system to serve public health — not private wealth.

“We…


Twitter is a powerful social networking tool and search engine in which you can easily find the current trend, information, news about virtually any topic.

The possibility of reaching hundreds of millions of leads through a free social media platform sounds exciting, right? But where do you start and how do you actually generate engaging content that people will want to interact with?

Where to start?

First thing’s first if you don’t have a twitter account register and get one today at twitter.com.

Your profile: Your display picture will be one of the first things people see from you. So…


The Coronavirus outbreak has rendered the 2020 and 2021 sales forecasts out dated. All industries have seen their top line plummet except a few such as critical goods, groceries and online collaboration tools. Businesses will need to realign their sales & marketing strategies and revaluate their sales forecasts as they navigate this downturn.

The old adage of “cash flow is king” couldn’t be more relevant in this economic downturn. Businesses need to prioritise customers and services / products offered during these times to effectively deliver services remotely, retain customers and sustain revenue.

A data driven approach to take stock of


Feature Engineering for customer churn.

A churn model can help you determine the most significant reasons customers decide to stop using your product or service, but it’s up to the data scientist building the model to decide which factors to train, test and ultimately include or exclude, a process called feature engineering.

In the machine learning workflow, the feature engineering part is the most creative part for data scientists. …

siddhesh dhuri

Helping businesses and small ventures #DoMore with data. Powering growth with analytics, NLP and Social Selling. Founder at Orox.ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store