Function Calling in Generative AI

Uğur Özker
11 min readAug 8, 2024

--

What is the meaning of Function Calling ?

Function calling is the ability to reliably connect LLMs to external tools to enable effective tool usage and interaction with external APIs.

LLMs like Llama, Mistral or GPT models have been fine-tuned to detect when a function needs to be called and then output JSON containing arguments to call the function. The functions that are being called by function calling will act as tools in your AI application and you can define more than one in a single request.

Function calling is an important ability for building LLM-powered chatbots or agents that need to retrieve context for an LLM or interact with external tools by converting natural language into API calls.

Functional calling enables developers to create:

  • conversational agents that can efficiently use external tools to answer questions. For example, the query “How many Turkish Liras is equals to 1000 dollars?” will be converted to a function call such as currency_converter(source_currency: string 'USD', target_currency: string 'TRY', amount: decimal '1000', when: datetime 'getdate()')
  • LLM-powered solutions for extracting and tagging data (e.g., extracting currency details from natural language based inputs)
  • applications that can help convert natural language to API calls or valid database queries
  • conversational knowledge retrieval engines that interact with a knowledge base

It seems that the function calling feature, which has emerged recently with generative artificial intelligence, can gradually be used in the product environment. When we examine the Berkeley Function-Calling Leaderboard, you will see that LLMs can successfully perform this task with a success rate of 90%

You can easily use successful model architectures such as Mistral, Llama and Granite in Leaderboard in the watsonx.ai environment. Below I will show you step by step how you can provide an end-to-end solution with watsonx.ai.

Advantages of using RAG and Function Calling together

One proven technique for hallucination results in large language models is retrieval-augmented generation, or RAG. RAG can use a receiver that searches for external data to enrich a prompt with binding before sending it to the generator, which is the LLM.

Coming to the relationship between RAG and Function Calling, if a scenario using the RAG method needs live or current data from a provider in the external environment, before the context found as a result of the recipient’s vector similarity search is sent to the LLM model, a service is created to query data from external providers with context function calling. program, and the context data containing data is thus combined with live and up-to-date external data, making it much richer and more usable.

For example; You can rely on the ability of LLMs to run the function of binding performance from real-time data such as stock prices, order tracking, flight patterns or inventory management within the binding you make from the vector database in your RAG application. The purpose of this integrated RAG and scope of functionality is to support the request with binding (from existing data sources or real-time APIs) so that the LLM can access the correct information.

LLMs with the ability to invoke options enable AI tools to perform certain tasks autonomously. For example, these capabilities enable LLMs to automate complex workflows involving data retrieval, processing, and analysis by familiarizing themselves with the capabilities offered by other APIs and systems.

Function Calling Use Cases

Below is a list of use cases that can benefit from the function calling capability of LLMs:

  • Conversational Agents: Function calling can be used to create complex conversational agents or chatbots that answer complex questions by calling external APIs or external knowledge base and providing more relevant and useful responses.
  • Natural Language Understanding: It can convert natural language into structured JSON data, extract structured data from text, and perform tasks like named entity recognition, sentiment analysis, and keyword extraction.
  • Math Problem Solving: Function calling can be used to define custom functions to solve complex mathematical problems that require multiple steps and different types of advanced calculations.
  • API Integration: It can be used to effectively integrate LLMs with external APIs to fetch data or perform actions based on the input. This could be helpful to build either a QA system or creative assistant. In general, function calling can convert natural language into valid API calls.
  • Information Extraction: Function calling be effectively used to extract specific information from a given input, such as retrieving relevant news stories or references from an article.

In this guide, we demonstrate how to prompt models like Mistral-Large-2 and other third party models to perform function calling for different use cases with watsonx.ai platform

Demo on creating a Generative AI-supported financial assistant/advisor by integrating the third party API with LLM Models

First of all, we must define in detail the json object for which we want to call the function. Function’s description parameter plays a key role in determining which LLM model will choose among all defined functions. It is very important to write a detailed and distinguishable description here. Afterwards, when LLM calls this function among all functions, it finds which function it should run by looking at the description fields, using a method similar to zero-shot-classification.

Additionally, the description section of each parameter in the parameters area explains how to find the correct parameter. To do this, LLM can find many objects such as date, number, text, location, time, money, person and easily match them with the correct parameter by using the entity-extraction method within the incoming natural language question.

In this way, LLM will not only determine which function should be called from the function list with zero-shot-classification, but also automatically extract the required parameters from the incoming query sentence and set them to variables, thanks to the entity-extraction feature.

{
"type": "function",
"function": {
"name": "finance_service",
"description": "finance advisor api provide all of the financial needs and information in that api, it serve financial instructions, international finance statuses, money transactions, loans, debits, banking operations or etc, you can found everything for financial area in this api",
"parameters": {
"type": "object",
"properties": {
"startdate": {
"type": "string",
"description": "requested transaction start date or time with format dd-MM-yyyy, it is required parameter you must find and return begin date value. If query not contains any start date, you can alternatively set start date as 01-01-2024",
},
"enddate": {
"type": "string",
"description": "requested transaction end date or time with format dd-MM-yyyy, it is required parameter you must find and return last date value. If query not contains any end date, you can alternatively set end date as 31-12-2024",
},
},
"required": ["startdate"],
"required": ["enddate"]
},
},
}

In the next step, we will access the platform via the ibm watsonx.ai-langchain SDK, and for this access we need parameters such as domain url, api key, project id provided to us from the cloud environment. After obtaining these parameters from the IBM watsonx.ai environment, we will be able to use all models in watsonx.ai in inference operations.

Within the scope of this scenario, we will work with the mistral-large-2 model to perform function calling and entity extraction operations.

    def tool_calling(self, categories, query) -> str:
response = ""
categories = categories
api_key=self.watsonx_ai_api_key

names_to_functions = {
'financial-advisor': functools.partial(self.get_finance_advice),
}

messages = [
ChatMessage(role="user", content=query)
]

from langchain_ibm import WatsonxLLM
parameters = {
"decoding_method": "sample",
"max_new_tokens": 400,
"min_new_tokens": 1,
"temperature": 0.0,
"top_k": 20,
"top_p": 1,
}

client = WatsonxLLM(
model_id="mistralai/mistral-large-latest",
url="<<cloud-watsonx.ai-base-url",
project_id="<<watsonx.ai-project-id>>",
apikey= api_key,
params=parameters,
)
response = client.chat(messages=messages, tools=self.tools, tool_choice="auto")
messages.append(response.choices[0].message)
tool_call = response.choices[0].message.tool_calls[0]
function_name = tool_call.function.name
function_params = json.loads(tool_call.function.arguments)
if self.validate_input(function_params):
print("\nfunction_name: ", function_name, "\nfunction_params: ", function_params)
function_result = names_to_functions[function_name](**function_params)
messages.append(ChatMessage(role="tool", name=function_name, content=function_result["content"], tool_call_id=tool_call.id))
response = client.chat(model=model, messages=messages)
else:
raise ValueError("Invalid input parameters")
return response.choices[0].message.content

The prepared prompt is as follows. This is our prompt and is based on finding out what the relevant question wants to express and which category it is related to. After receiving the relevant values, this prompt queries the third-party institution’s API and returns the category values ​​by the institution and is stored to be used for the final query to be sent to the institution.

    def prompt_generator(self, question, table) -> str:
prompt_category_detection = """You are a finance expert tasked with analyzing the following finance sentence and selecting the most relevant title from a specified table. Follow these steps:
**Retrieve Relevant Data**: From the identified table, find the finance topic and finance category that best match the content of the sentence.
**Finance Sentence:**
"{question}"
**Dictionary:**
{table}
**Instructions:**
- Determine the correct table from the dictionary.
- Use this table to find the finance topic and finance category values that are most relevant to the finance sentence
- Ensure that the values retrieved are the best match to the content of the sentence.
**Conclusion:**
Provide the result in the following format, only return following information not add any other word or sentence in response, give answer only in JSON Object format, only return answer with the following format do not use different format:
{{"category": "found_category", "id": category_id}}
"""
return prompt_category_detection.format(question=question, table=table)

The category service call is made as follows and the responses are stored in a variable to send the POST request to the main service.

def _parse_user_query(self, user_query: str) -> str:
category_prompt = self.prompt_generator(question=user_query, table=self.categories)
category_response = self.model.invoke(category_prompt)
category_json = json.loads(category_response.content)
print("founded values from LLM zero-shot-classification:", category_json)
category_id = category_json["id"]
return category_json.content

Using the categories returned from the first service, the start date and end date parameters extracted from the question with entity extraction as input, LLM automatically calls the following function. While making this call, the relevant function is not called anywhere else in the code.

 def get_finance_advice(self, startdate, enddate) -> Dict:
full_category_codes = ""
for cin self.category_codes:
full_category_codes += c["code"] + "-"
base_url = '<<third-party-api-base-uri' + full_category_codes +'&startDate='+ startdate +'&endDate='+ enddate
response = requests.get(base_url+'&type=json', headers=headers)
print(response.content)
return {
"status_code": response.status_code,
"content": response.content
}

Solution Architecture of the LLM based Financial Suggestion System

The solution architecture of the generative AI supported financial recommendation system, which works very successfully from end to end, is as follows. The system receives the queries from the user through an interface or API and sends them to the watsonx.ai environment in the background via SDK. Afterwards, watsonx.ai queries the incoming query using the Mistral-Large-2 model through its own foundational model repository and determines which function will be used in the 4th step. Then, entity extraction is performed within the function determined in step 5, and time-based parameters such as start and end date are found and sent to the variables. Here, the large language model uses the zero-shot-classification feature to find the right function and then entity extraction to find the right parameters from the query sentence sent in natural language.

In the next step, LLM goes to the catalog embedded in watsonx.ai to call the correct function and initiates a call to the function via Python. Another classification process is required to find the relevant catalog information during the call. For this purpose, we integrated the gpt-4o-mini model into watsonx.ai via SDK and benefited from its classification capabilities on tabular data. The reason why we use gpt-4o-mini here is that this model is more successful in tasks such as zero-shot-classification in tabular data than other model architectures such as mistral and llama and can always give the same result. A second reason is to show that the watsonx.ai environment can be used within its own system via SDK by accessing models in different clouds or other environments, in addition to the models in its own model server.

While doing all this, we took full advantage of langchain capabilities at every step. After performing the classification operations on tabular data in the LLM inference operations shown in 2, 3, and 4, the service call is now made with watsonx.ai with all the obtained parameters.

Values ​​returned from the service. In the last step, it is sent back to the mistral-large-2 model in watsonx.ai and asked to produce an appropriate answer to the question by combining it with the service data received from the institution. As you can see below, there are step by step outputs of all the processes and LLM answers produced by watsonx.ai.

Finally, the process is completed by performing text generation. The needs of 3 different tasks (zero-shot-classification, entity-extraction, text generation) with two different models throughout the end-to-end flow were met by sending 5 separate inference requests to the LLM models.

You can interpret the system and model outputs as follows.

Test Results:

Question 1: What is the highest level of the British Pound against the Turkish Lira in May 2024?

Categorization Results from LLM classification and entity extraction:

{'category': 'Exchange Rates', 'id': 35}
[{'name': '(GBP) British Pound (Buying Foreign Currency Price)', 'code': 'EN-POUND-BUY'}, {'name': '(GBP) English Pound (Foreign Currency Sale)', 'code': 'EN-POUND-SALE'}]

Function Parameters, LLM get these parameters with entity extraction features:

{'startdate': '01-05-2024', 'enddate': '31-05-2024'}

Text Generation Result from LLM, In this step, LLM uses api response body values

Answer from Watsonx.ai = “In May 2024, the highest level of the British Pound against the Turkish Lira was 41,2021 TRY”

Question 2: According to the latest data, what is the annual change rate of Stocks and Debt Securities in the Portfolio of Non-Residents?

Categorization Results from LLM classification and entity extraction:

{'category': 'International Investment Position', 'id': 36}
{'name': 'Stocks and Debt Securities in the Portfolio of Persons Resident Abroad', 'code': 'stock_and_debt_external_citizens'}
[{'name': 'Banks overseas banks deposits', 'code': 'bank_overseas_deposits'}, {'name': 'Banks Foreign Exchange Deposit Acounts', 'code': 'Bank_Foreign_Exchange_Deposits'}]

Function Parameters, LLM get these parameters with entity extraction features:

{'startdate': '01-01-2024', 'enddate': '31-12-2024'}

Text Generation Result from LLM, In this step, LLM uses api response body values

Answer from Watsonx.ai = “The annual change rate of Stocks and Debt Securities in the Portfolio of Non-Residents is 5.7%.”

Question 3: What percentage of total Debit Card and Credit Card expenditures were made in the Airlines and Accommodation sectors in 2023?

Categorization Results from LLM classification and entity extraction:

{'category': 'Bank and Credit Card Sectoral Expenditure Statistics', 'id': 77}
{'name': 'Debit Card and Credit Card Spending Amount', 'code': 'debit_credit_sp'}
[{'name': 'Airlines', 'code': 'trx-airlines-fnc'}, {'name': 'Accommodation', 'code': 'trx-accom-fnc'}]

Function Parameters, LLM get these parameters with entity extraction features:

{'startdate': '01-01-2023', 'enddate': '31-12-2023'}

Text Generation Result from LLM, In this step, LLM uses api response body values

Answer from Watsonx.ai = “In 2023, 10.5 percent of total Debit Card and Credit Card expenditures were made in the Airlines and Accommodation sectors. The total expenditure amount is 199,384,100 TL.”

Additionally, a training video explaining in detail how you can make function calling with the mistral-large-2 model we use in watsonx.ai is also available on the mistral official site. What we do is to set up a service call or RAG structure via langchain, using the mistral model in watsonx.ai.

I would also like to thank all my teammates, Seray Boynuyoğun, Ceyda Hamurcu, Bengü Sanem Pazvant, Merve Özmen, Yunus Emre Emik and Ahmet Sait Çelik, who contributed to this work we carried out as the IBM Turkey Client Engineering team. This cannot be done without a team. One Team!!

--

--

Uğur Özker

Computer Engineer, MSc, MBA, PMP®, Senior Solution Architect IBM