benchmark - Dr. Bob Sutor

Notable and Interesting Recent AI News, Articles, and Papers for Thursday, July 18, 2024

July 24, 2024July 18, 2024 by Bob Sutor

A selection of the most important recent news, articles, and papers about AI.

News, Articles, and Analyses

IBM text-to-SQL generator tops leaderboard – IBM Research

(Tuesday, July 02, 2024) “IBM’s generative AI solution takes a top spot on the BIRD benchmark for handling complex database queries”

Reaffirming IBM’s commitment to the Rome Call for AI ethics – IBM Research

(Monday, July 15, 2024) “IBM joined representatives from many of the world’s major religions in Japan to discuss ethical AI development.”

AMD takes a deep dive into architecture for the AI PC chips | VentureBeat

Author: Dean Takahashi

(Monday, July 15, 2024) “Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Advanced Micro Devices executives revealed the details of the chipmaker’s latest AI PC architecture, which includes a new neural processing unit (NPU) in the company’s latest AMD Ryzen AI chips. The company announced the latest AMD Ryzen […]”

MathΣtral | Mistral AI | Frontier AI in your hands

(Tuesday, July 16, 2024) “As a tribute to Archimedes, whose 2311th anniversary we’re celebrating this year, we are proud to release our first Mathstral model, a specific 7B model designed for math reasoning and scientific discovery. The model has a 32k context window published under the Apache 2.0 license.”

AI in gaming: Developers worried by generative tech

“In a struggling games industry AI has been hailed as a possible saviour. But not everyone’s convinced.”

Technical Papers and Preprints

[2407.12690] The Dual Imperative: Innovation and Regulation in the AI Era

Author: Carvão, Paulo

(Thursday, May 23, 2024) “This article addresses the societal costs associated with the lack of regulation in Artificial Intelligence and proposes a framework combining innovation and regulation. Over fifty years of AI research, catalyzed by declining computing costs and the proliferation of data, have propelled AI into the mainstream, promising significant economic benefits. Yet, this rapid adoption underscores risks, from bias amplification and labor disruptions to existential threats posed by autonomous systems. The discourse is polarized between accelerationists, advocating for unfettered technological advancement, and doomers, calling for a slowdown to prevent dystopian outcomes. This piece advocates for a middle path that leverages technical innovation and smart regulation to maximize the benefits of AI while minimizing its risks, offering a pragmatic approach to the responsible progress of AI technology. Technical invention beyond the most capable foundation models is needed to contain catastrophic risks. Regulation is required to create incentives for this research while addressing current issues.”

[2407.12043] The Art of Saying No: Contextual Noncompliance in Language Models

Authors: Brahman, Faeze; Kumar, Sachin; Balachandran, Vidhisha; Dasigi, Pradeep; Pyatkin, Valentina; Ravichander, Abhilasha; Wiegreffe, Sarah; Dziri, Nouha; Chandu, Khyathi; Hessel, Jack; Tsvetkov, Yulia; Smith, Noah A.; Choi, Yejin; Hajishirzi, Hannaneh

(Tuesday, July 02, 2024) “Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of “unsafe” queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Our experiments demonstrate that while direct finetuning of instruction-tuned models can lead to both over-refusal and a decline in general capabilities, using parameter efficient methods like low rank adapters helps to strike a good balance between appropriate noncompliance and other capabilities.”

Notable and Interesting Recent AI News, Articles, and Papers for Monday, July 15, 2024

July 24, 2024July 15, 2024 by Bob Sutor

A selection of the most important recent news, articles, and papers about AI.

News, Articles, and Analyses

Developers get by with a little help from AI: Stack Overflow Knows code assistant pulse survey results – Stack Overflow

Gen AI and beyond: Where else to focus now | McKinsey

(Friday, July 12, 2024) “Yes, gen AI can be dazzling. But to deliver value, leaders will have to look beyond center stage.”

Designing for Education with Artificial Intelligence: An Essential Guide for Developers – Office of Educational Technology

“Informing product leads and their teams of innovators, designers, and developers as they work toward safety, security, and trust while creating AI products and services for use in education.”

IBM’s AI, Open-Source Granite Models & Sports Technology – The Futurum Group

Author: Steven Dickens

“Chief Technology Advisor Steven Dickens shares insights on how IBM uses AI to enhance sports, democratizing innovation through open-source.”

Technical Papers and Preprints

[2407.08488] Lynx: An Open Source Hallucination Evaluation Model

Authors: Ravi, Selvan Sunitha; Mielczarek, Bartosz; Kannappan, Anand; Kiela, Douwe; Qian, Rebecca

(Thursday, July 11, 2024) “Retrieval Augmented Generation (RAG) techniques aim to mitigate hallucinations in Large Language Models (LLMs). However, LLMs can still produce information that is unsupported or contradictory to the retrieved contexts. We introduce LYNX, a SOTA hallucination detection LLM that is capable of advanced reasoning on challenging real-world hallucination scenarios. To evaluate LYNX, we present HaluBench, a comprehensive hallucination evaluation benchmark, consisting of 15k samples sourced from various real-world domains. Our experiment results show that LYNX outperforms GPT-4o, Claude-3-Sonnet, and closed and open-source LLM-as-a-judge models on HaluBench. We release LYNX, HaluBench and our evaluation code for public access.”

[2407.08105] Federated Learning and AI Regulation in the European Union: Who is Responsible? — An Interdisciplinary Analysis

Authors: Woisetschläger, Herbert; Mertel, Simon; Krönke, Christoph; Mayer, Ruben; Jacobsen, Hans-Arno

(Thursday, July 11, 2024) “The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL’s practical applicability under the EU AI Act.”

Notable Recent Quantum News, Articles, and Papers for Thursday, July 11, 2024

July 13, 2024July 11, 2024 by Bob Sutor

A selection of the most important recent news and articles about #quantumcomputing.

Fourier Quantum Process Tomography | npj Quantum Information

(Thursday, May 09, 2024) “The characterization of a quantum device is a crucial step in the development of quantum experiments. This is accomplished via Quantum Process Tomography, which combines the outcomes of different projective measurements to deliver a possible reconstruction of the underlying process. The tomography is typically performed by processing an overcomplete set of measurements and extracting the process matrix from maximum-likelihood estimation. Here, we introduce Fourier Quantum Process Tomography, a technique which requires a reduced number of measurements, and benchmark its performance against the standard maximum-likelihood approach. Fourier Quantum Process Tomography is based on measuring probability distributions in two conjugate spaces for different state preparations and projections. Exploiting the concept of phase retrieval, our scheme achieves a complete and robust characterization of the setup by processing a near-minimal set of measurements. We experimentally test the technique on different space-dependent polarization transformations, reporting average fidelities higher than 90% and significant computational advantage.”

Enabling Quantum Computing with AI | NVIDIA Technical Blog

(Sunday, May 12, 2024) “Building a useful quantum computer in practice is incredibly challenging. Significant improvements are needed in the scale, fidelity, speed, reliability, and programmability of quantum computers to…”

Kipu Quantum Acquires Quantum Computing Platform Built by Anaqor AG to Accelerate Development of Industrially Relevant Quantum Solutions

(Thursday, July 11, 2024) “/PRNewswire/ — Kipu Quantum, the worldwide leading quantum software company, announced today the strategic acquisition of PlanQK, the German quantum computing…”

Simulating the universe’s most extreme environments | IBM Quantum Computing Blog

“Scalable techniques for quantum simulations of high-energy physics.”

Quantum in Context: Quantum Companies Rotate in New Leaders – The Futurum Group

“Learn which quantum computing companies have recently replaced their CEOs & reasons Boards of Directors make such changes.”

EDF, Alice & Bob, Quandela and CNRS Partner to Optimize Quantum Computing’s Energy Efficiency

“PARIS, July 10, 2024 — French electric utility company EDF, in collaboration with quantum computing firms Quandela and Alice & Bob, and the French National Centre for Scientific Research (CNRS), has […]”

Study of Quantum Computing Energy Efficiency – The Futurum Group

“Learn about a study in France that will look at the energy efficiency of quantum computing systems versus HPC for well-known algorithms.”

Oxford Ionics breaks global quantum performance records

“Oxford Ionics has demonstrated the highest performing quantum chip in the world, which can be produced at scale in a standard semiconductor fabrication plant.”

From classical FLOPS to quantum CLOPS

June 9, 2024November 3, 2021 by Bob Sutor

This week IBM Quantum introduced a new quantum computing architecture-neutral performance speed metric called CLOPS. CLOPS is short for “Circuit Layer Operations per Second.” For those of you familiar with measuring the speed of high performance classical computers, the acronym is a nod to the metric FLOPS, or “Floating point Operations per Second.”

CLOPS joins Quantum Volume (a measurement of quantum circuit execution quality) and Number of Qubits (an indication of the size of the problem your system can handle) as one of the three essential ways that we can measure progress toward practical Quantum Advantage. Quantum Advantage signifies the point where quantum computers together with classical computing systems can do better than the classical systems alone.