Bluesky Helps Reduce Machine Learning Costs with Cost Governance Algorithms

Bluesky Helps Reduce Machine Learning Costs with Cost Governance Algorithms

Couldn’t attend Transform 2022? Discover all the summit sessions now in our on-demand library! Look here.

Query optimization is not necessarily new. Cloud cost governance to identify and control query spend is also not new. What’s new, however, is Bluesky, a cloud-based, Snowflake-focused workload optimization provider that launched earlier this month to help organizations achieve these goals.

One of the critical elements of the company’s approach is “the algorithms we created ourselves, based on each of our last 15 years of experience tuning workloads at Google, Uber , etc.”, said Mingsheng Hong, CEO of Bluesky.

Hong is the former Head of Machine Learning Execution Capabilities Engineering at Google, a role in which he worked extensively with TensorFlow. Bluesky was co-founded by Hong and CTO Zheng Shao, a distinguished former engineer at Uber, where he specialized in big data architecture and cost reduction.

The algorithms referenced by Hong analyze queries at scale, primarily in cloud environments, and determine how to optimize their workloads, thereby reducing their costs. “Individual queries rarely have commercial value,” Hong observed. “It’s a combination of them that together achieves certain business goals, like transforming data and delivering business insights.”


MetaBeat 2022

MetaBeat will bring together thought leaders to advise on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, California.

register here

What is particularly interesting is that Bluesky combines statistical and symbolic artificial intelligence (AI) approaches for this task, tangibly illustrating that their fusion can influence the future of AI in the enterprise.

Machine Learning Query Cost Governance

Bluesky strengthens cost governance in several ways by optimizing the time and resources spent querying popular cloud sources. The solution can limit query redundancy through incremental materialization, a useful feature for recurring queries in set increments, such as hourly, daily, or weekly.

According to Hong, when analyzing monthly revenue figures, for example, this capability allows systems to “materialize the previous calculation and only calculate the incremental part”, or the delta since the last calculation. When applied on a large scale, this feature can save a considerable amount of tax and IT resources.

Tuning Recommendations

Bluesky provides detailed visibility into query patterns and their consumption. The solution offers a rolling list of the most expensive query patterns, along with other techniques to “show people how much they’re spending,” Hong said. “We break it down into individual users, teams, projects, call centers, etc., so everyone knows how much others are spending.”

Bluesky incorporates algorithms that involve statistical and non-statistical AI approaches for profile-based query cost allocation. Query profiles are based on the time, CPU, and memory required by specific queries. Algorithms use this information to reduce the use of these resources for queries through tuning recommendations to change query code, data layout, etc. “Optimization is not just about calculation,” Hong noted. “Additionally, we organize the storage: the indexes of the tables, the way you lay out the tables, then there are the warehouse settings and the system settings that we modify.”

Rules and supervised machine learning

Significantly, the algorithms providing such recommendations and analyzing the factors mentioned by Hong involve rule-based approaches and machine learning. As such, they combine the classic knowledge representation base of AI with its statistical base. There are many use cases for such a tandem (called neuro-symbolic AI) for natural language technologies. Gartner referred to the inclusion of these two forms of AI as part of a larger composite AI movement. According to Hong, rules are a natural fit for query optimization.

“It’s like query optimization starts with rules and you enrich them with the cost model,” he explained. “There are cases where trying to run a filter is always a good idea. So that’s a good rule. Eliminating a full table sweep is always good. That’s a rule.

Supervised learning is added when implementing rules based on cost conditions or the cost model. For example, eliminating queries with low ROI is a useful rule. Supervised learning techniques can determine which queries match this classification by examining the value of queries from the last week, for example, before discarding them via rules. “If a request fails more than 98% of the time in the last seven days, you can put such a request pattern in a penalty box,” Hong remarked.

Master the costs

The need to reduce business costs, especially as they apply to multicloud and hybrid cloud environments, will certainly increase over the next few years. Cost governance and workload optimization methods that optimize queries are helpful in understanding where costs are rising and how to reduce them. Relying on automation that uses both statistical and non-statistical AI to identify these areas, while offering suggestions for fixing these issues, can be a harbinger of where AI is headed. company

VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Discover our Briefings.

Similar Posts

Leave a Reply

Your email address will not be published.