Azure OpenAI Service: Building Intelligent Apps with GPT Models

Introduction

[Explain Azure OpenAI as managed service bringing GPT-4, embeddings, DALL-E to enterprise with security, compliance, content filtering.]

Prerequisites

Azure subscription with OpenAI access (request approval)
Python 3.8+ or .NET 6+
Basic understanding of REST APIs

Key Concepts

Concept	Description
Model Deployment	Instance of GPT-4, GPT-3.5, embeddings model
Tokens	Text chunks (~4 chars); pricing based on token count
Temperature	Randomness (0=deterministic, 2=creative)
Max Tokens	Response length limit
Embeddings	Vector representation of text for semantic search

Step-by-Step Guide

Step 1: Provision Azure OpenAI Resource

az cognitiveservices account create \
  --name contoso-openai \
  --resource-group rg-ai \
  --kind OpenAI \
  --sku S0 \
  --location eastus

# Get endpoint and keys
az cognitiveservices account show \
  --name contoso-openai \
  --resource-group rg-ai \
  --query properties.endpoint

az cognitiveservices account keys list \
  --name contoso-openai \
  --resource-group rg-ai

Step 2: Deploy GPT-4 Model

Azure Portal:

Navigate to Azure OpenAI Studio
Deployments → Create new deployment
Select model: gpt-4 or gpt-4-32k
Name: gpt4-deployment
Deploy

REST API:

curl -X POST "https://contoso-openai.openai.azure.com/openai/deployments?api-version=2023-05-15" \
  -H "api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "scale_settings": {
      "scale_type": "Standard"
    }
  }'

Step 3: Generate Completions (Python)

import openai

openai.api_type = "azure"
openai.api_base = "https://contoso-openai.openai.azure.com/"
openai.api_version = "2023-05-15"
openai.api_key = "YOUR_API_KEY"

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Azure Functions in 100 words."}
    ],
    temperature=0.7,
    max_tokens=150,
    top_p=0.95
)

print(response['choices'][0]['message']['content'])

Step 4: Prompt Engineering Best Practices

System Prompt (Context Setting):

system_prompt = """
You are an expert Azure architect. Provide concise, accurate answers with code examples when relevant.
Format responses in Markdown. Include links to Microsoft Learn documentation.
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "How do I secure Azure Functions?"}
]

Few-Shot Learning:

messages = [
    {"role": "system", "content": "Classify support tickets by priority."},
    {"role": "user", "content": "Server is down"},
    {"role": "assistant", "content": "Priority: High"},
    {"role": "user", "content": "Password reset request"},
    {"role": "assistant", "content": "Priority: Medium"},
    {"role": "user", "content": "Feature suggestion"},
    {"role": "assistant", "content": "Priority: Low"},
    {"role": "user", "content": "Database corruption detected"}
]

Step 5: Embeddings for Semantic Search

Generate Embeddings:

def get_embedding(text):
    response = openai.Embedding.create(
        engine="text-embedding-ada-002",
        input=text
    )
    return response['data'][0]['embedding']

# Index documents
documents = [
    "Azure Functions is a serverless compute service.",
    "Logic Apps provide workflow orchestration.",
    "App Service hosts web applications."
]

embeddings = [get_embedding(doc) for doc in documents]

Semantic Search with Cosine Similarity:

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def search(query, documents, embeddings, top_k=3):
    query_embedding = get_embedding(query)
    similarities = cosine_similarity([query_embedding], embeddings)[0]
    top_indices = np.argsort(similarities)[::-1][:top_k]
    
    return [(documents[i], similarities[i]) for i in top_indices]

results = search("How to run code without servers?", documents, embeddings)
for doc, score in results:
    print(f"{score:.3f}: {doc}")

Step 6: RAG (Retrieval-Augmented Generation)

Combine Search + GPT-4:

def rag_query(user_query, knowledge_base, embeddings):
    # Step 1: Retrieve relevant context
    top_docs = search(user_query, knowledge_base, embeddings, top_k=3)
    context = "\n".join([doc for doc, _ in top_docs])
    
    # Step 2: Generate answer with context
    messages = [
        {"role": "system", "content": f"Answer based on this context:\n{context}"},
        {"role": "user", "content": user_query}
    ]
    
    response = openai.ChatCompletion.create(
        engine="gpt4-deployment",
        messages=messages,
        temperature=0.3
    )
    
    return response['choices'][0]['message']['content']

answer = rag_query("What is Azure Functions?", documents, embeddings)
print(answer)

Step 7: Content Filtering & Responsible AI

Enable Content Filters:

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[...],
    content_filter={
        "hate": {"severity": "medium"},
        "self_harm": {"severity": "medium"},
        "sexual": {"severity": "medium"},
        "violence": {"severity": "medium"}
    }
)

# Check if content was filtered
if 'content_filter_results' in response['choices'][0]:
    print("Content filtered:", response['choices'][0]['content_filter_results'])

Implement Guardrails:

def validate_input(user_input):
    # Check for PII patterns
    import re
    if re.search(r'\b\d{3}-\d{2}-\d{4}\b', user_input):  # SSN pattern
        raise ValueError("Input contains sensitive data")
    
    # Check input length
    if len(user_input) > 4000:
        raise ValueError("Input too long")
    
    return True

def safe_completion(user_input):
    validate_input(user_input)
    return openai.ChatCompletion.create(...)

Step 8: Function Calling

Define Functions:

functions = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
]

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    functions=functions,
    function_call="auto"
)

# If GPT decides to call function
if response['choices'][0]['finish_reason'] == 'function_call':
    function_name = response['choices'][0]['message']['function_call']['name']
    arguments = json.loads(response['choices'][0]['message']['function_call']['arguments'])
    
    # Execute function
    weather_data = get_weather(arguments['location'])
    
    # Send result back to GPT
    messages = [
        {"role": "user", "content": "What's the weather in Seattle?"},
        response['choices'][0]['message'],
        {"role": "function", "name": function_name, "content": json.dumps(weather_data)}
    ]
    
    final_response = openai.ChatCompletion.create(
        engine="gpt4-deployment",
        messages=messages
    )

Advanced Patterns

Streaming Responses

response = openai.ChatCompletion.create(
    engine="gpt4-deployment",
    messages=[...],
    stream=True
)

for chunk in response:
    if 'content' in chunk['choices'][0]['delta']:
        print(chunk['choices'][0]['delta']['content'], end='')

Token Management

import tiktoken

def count_tokens(text, model="gpt-4"):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Truncate to fit token limit
def truncate_text(text, max_tokens=3000):
    tokens = tiktoken.encoding_for_model("gpt-4").encode(text)
    if len(tokens) > max_tokens:
        return tiktoken.encoding_for_model("gpt-4").decode(tokens[:max_tokens])
    return text

Conversation Memory

class ConversationHistory:
    def __init__(self, max_tokens=4000):
        self.messages = []
        self.max_tokens = max_tokens
    
    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        self._trim_history()
    
    def _trim_history(self):
        total_tokens = sum(count_tokens(m['content']) for m in self.messages)
        while total_tokens > self.max_tokens and len(self.messages) > 1:
            self.messages.pop(1)  # Keep system message
            total_tokens = sum(count_tokens(m['content']) for m in self.messages)

Cost Optimization

Strategy	Description
Cache responses	Store common queries
Use GPT-3.5-turbo	10x cheaper than GPT-4 for simple tasks
Limit max_tokens	Set realistic response length
Batch requests	Process multiple queries in parallel
Monitor usage	Track token consumption via Azure Monitor

Monitor Usage:

AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where OperationName == "ChatCompletions_Create"
| extend Tokens = toint(properties_s.total_tokens)
| summarize TotalTokens = sum(Tokens), RequestCount = count() by bin(TimeGenerated, 1h)

Security Best Practices

Store API keys in Azure Key Vault
Use managed identities for authentication
Implement rate limiting
Enable Azure Private Link for network isolation
Log all requests for audit

Troubleshooting

Issue: 429 Too Many Requests
Solution: Implement exponential backoff; request quota increase

Issue: Model not available in region
Solution: Check regional availability; use alternative region

Issue: High token costs
Solution: Use GPT-3.5 for simpler tasks; cache responses; truncate prompts

Best Practices

Start with low temperature (0.3) for factual tasks
Use system prompts to set context and constraints
Implement content filtering for user-generated prompts
Monitor token usage and set budgets
Test prompts in Azure OpenAI Studio playground

Key Takeaways

Azure OpenAI brings GPT-4 with enterprise security.
Embeddings enable semantic search and RAG patterns.
Prompt engineering significantly impacts output quality.
Function calling enables agent-like behavior.

Next Steps

Integrate with Azure Cognitive Search for hybrid search
Build chatbot with Bot Framework + Azure OpenAI
Implement fine-tuning for domain-specific tasks

Additional Resources

What intelligent feature will you build first?