Azure OpenAI Service: Building Intelligent Apps with GPT Models

Azure OpenAI Service: Building Intelligent Apps with GPT Models

Introduction

[Explain Azure OpenAI as managed service bringing GPT-4, embeddings, DALL-E to enterprise with security, compliance, content filtering.]

Prerequisites

  • Azure subscription with OpenAI access (request approval)
  • Python 3.8+ or .NET 6+
  • Basic understanding of REST APIs

Key Concepts

Concept Description
Model Deployment Instance of GPT-4, GPT-3.5, embeddings model
Tokens Text chunks (~4 chars); pricing based on token count
Temperature Randomness (0=deterministic, 2=creative)
Max Tokens Response length limit
Embeddings Vector representation of text for semantic search

Step-by-Step Guide

Step 1: Provision Azure OpenAI Resource

az cognitiveservices account create \
  --name contoso-openai \
  --resource-group rg-ai \
  --kind OpenAI \
  --sku S0 \
  --location eastus

# Get endpoint and keys
az cognitiveservices account show \
  --name contoso-openai \
  --resource-group rg-ai \
  --query properties.endpoint

az cognitiveservices account keys list \
  --name contoso-openai \
  --resource-group rg-ai

Step 2: Deploy GPT-4 Model

Azure Portal:

  1. Navigate to Azure OpenAI Studio
  2. Deployments → Create new deployment
  3. Select model: gpt-4 or gpt-4-32k
  4. Name: gpt4-deployment
  5. Deploy

REST API:

curl -X POST "https://contoso-openai.openai.azure.com/openai/deployments?api-version=2023-05-15" \
  -H "api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "scale_settings": {
      "scale_type": "Standard"
    }
  }'

Step 3: Generate Completions (Python)

import openai

openai.api_type = "azure"
openai.api_base = "https://contoso-openai.openai.azure.com/"
openai.api_version = "2023-05-15"
openai.api_key = "YOUR_API_KEY"

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Azure Functions in 100 words."}
    ],
    temperature=0.7,
    max_tokens=150,
    top_p=0.95
)

print(response['choices'][0]['message']['content'])

Step 4: Prompt Engineering Best Practices

System Prompt (Context Setting):

system_prompt = """
You are an expert Azure architect. Provide concise, accurate answers with code examples when relevant.
Format responses in Markdown. Include links to Microsoft Learn documentation.
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "How do I secure Azure Functions?"}
]

Few-Shot Learning:

messages = [
    {"role": "system", "content": "Classify support tickets by priority."},
    {"role": "user", "content": "Server is down"},
    {"role": "assistant", "content": "Priority: High"},
    {"role": "user", "content": "Password reset request"},
    {"role": "assistant", "content": "Priority: Medium"},
    {"role": "user", "content": "Feature suggestion"},
    {"role": "assistant", "content": "Priority: Low"},
    {"role": "user", "content": "Database corruption detected"}
]

Step 5: Embeddings for Semantic Search

Generate Embeddings:

def get_embedding(text):
    response = openai.Embedding.create(
        engine="text-embedding-ada-002",
        input=text
    )
    return response['data'][0]['embedding']

# Index documents
documents = [
    "Azure Functions is a serverless compute service.",
    "Logic Apps provide workflow orchestration.",
    "App Service hosts web applications."
]

embeddings = [get_embedding(doc) for doc in documents]

Semantic Search with Cosine Similarity:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def search(query, documents, embeddings, top_k=3):
    query_embedding = get_embedding(query)
    similarities = cosine_similarity([query_embedding], embeddings)[0]
    top_indices = np.argsort(similarities)[::-1][:top_k]
    
    return [(documents[i], similarities[i]) for i in top_indices]

results = search("How to run code without servers?", documents, embeddings)
for doc, score in results:
    print(f"{score:.3f}: {doc}")

Step 6: RAG (Retrieval-Augmented Generation)

Combine Search + GPT-4:

def rag_query(user_query, knowledge_base, embeddings):
    # Step 1: Retrieve relevant context
    top_docs = search(user_query, knowledge_base, embeddings, top_k=3)
    context = "\n".join([doc for doc, _ in top_docs])
    
    # Step 2: Generate answer with context
    messages = [
        {"role": "system", "content": f"Answer based on this context:\n{context}"},
        {"role": "user", "content": user_query}
    ]
    
    response = openai.ChatCompletion.create(
        engine="gpt4-deployment",
        messages=messages,
        temperature=0.3
    )
    
    return response['choices'][0]['message']['content']

answer = rag_query("What is Azure Functions?", documents, embeddings)
print(answer)

Step 7: Content Filtering & Responsible AI

Enable Content Filters:

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[...],
    content_filter={
        "hate": {"severity": "medium"},
        "self_harm": {"severity": "medium"},
        "sexual": {"severity": "medium"},
        "violence": {"severity": "medium"}
    }
)

# Check if content was filtered
if 'content_filter_results' in response['choices'][0]:
    print("Content filtered:", response['choices'][0]['content_filter_results'])

Implement Guardrails:

def validate_input(user_input):
    # Check for PII patterns
    import re
    if re.search(r'\b\d{3}-\d{2}-\d{4}\b', user_input):  # SSN pattern
        raise ValueError("Input contains sensitive data")
    
    # Check input length
    if len(user_input) > 4000:
        raise ValueError("Input too long")
    
    return True

def safe_completion(user_input):
    validate_input(user_input)
    return openai.ChatCompletion.create(...)

Step 8: Function Calling

Define Functions:

functions = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    functions=functions,
    function_call="auto"
)

# If GPT decides to call function
if response['choices'][0]['finish_reason'] == 'function_call':
    function_name = response['choices'][0]['message']['function_call']['name']
    arguments = json.loads(response['choices'][0]['message']['function_call']['arguments'])
    
    # Execute function
    weather_data = get_weather(arguments['location'])
    
    # Send result back to GPT
    messages = [
        {"role": "user", "content": "What's the weather in Seattle?"},
        response['choices'][0]['message'],
        {"role": "function", "name": function_name, "content": json.dumps(weather_data)}
    ]
    
    final_response = openai.ChatCompletion.create(
        engine="gpt4-deployment",
        messages=messages
    )

Advanced Patterns

Streaming Responses

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[...],
    stream=True
)

for chunk in response:
    if 'content' in chunk['choices'][0]['delta']:
        print(chunk['choices'][0]['delta']['content'], end='')

Token Management

import tiktoken

def count_tokens(text, model="gpt-4"):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Truncate to fit token limit
def truncate_text(text, max_tokens=3000):
    tokens = tiktoken.encoding_for_model("gpt-4").encode(text)
    if len(tokens) > max_tokens:
        return tiktoken.encoding_for_model("gpt-4").decode(tokens[:max_tokens])
    return text

Conversation Memory

class ConversationHistory:
    def __init__(self, max_tokens=4000):
        self.messages = []
        self.max_tokens = max_tokens
    
    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        self._trim_history()
    
    def _trim_history(self):
        total_tokens = sum(count_tokens(m['content']) for m in self.messages)
        while total_tokens > self.max_tokens and len(self.messages) > 1:
            self.messages.pop(1)  # Keep system message
            total_tokens = sum(count_tokens(m['content']) for m in self.messages)

Cost Optimization

Strategy Description
Cache responses Store common queries
Use GPT-3.5-turbo 10x cheaper than GPT-4 for simple tasks
Limit max_tokens Set realistic response length
Batch requests Process multiple queries in parallel
Monitor usage Track token consumption via Azure Monitor

Monitor Usage:

AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where OperationName == "ChatCompletions_Create"
| extend Tokens = toint(properties_s.total_tokens)
| summarize TotalTokens = sum(Tokens), RequestCount = count() by bin(TimeGenerated, 1h)

Security Best Practices

  • Store API keys in Azure Key Vault
  • Use managed identities for authentication
  • Implement rate limiting
  • Enable Azure Private Link for network isolation
  • Log all requests for audit

Troubleshooting

Issue: 429 Too Many Requests
Solution: Implement exponential backoff; request quota increase

Issue: Model not available in region
Solution: Check regional availability; use alternative region

Issue: High token costs
Solution: Use GPT-3.5 for simpler tasks; cache responses; truncate prompts

Best Practices

  • Start with low temperature (0.3) for factual tasks
  • Use system prompts to set context and constraints
  • Implement content filtering for user-generated prompts
  • Monitor token usage and set budgets
  • Test prompts in Azure OpenAI Studio playground

Key Takeaways

  • Azure OpenAI brings GPT-4 with enterprise security.
  • Embeddings enable semantic search and RAG patterns.
  • Prompt engineering significantly impacts output quality.
  • Function calling enables agent-like behavior.

Next Steps

  • Integrate with Azure Cognitive Search for hybrid search
  • Build chatbot with Bot Framework + Azure OpenAI
  • Implement fine-tuning for domain-specific tasks

Additional Resources


What intelligent feature will you build first?