Azure OpenAI Service: Building Intelligent Apps with GPT Models
Introduction
[Explain Azure OpenAI as managed service bringing GPT-4, embeddings, DALL-E to enterprise with security, compliance, content filtering.]
Prerequisites
- Azure subscription with OpenAI access (request approval)
- Python 3.8+ or .NET 6+
- Basic understanding of REST APIs
Key Concepts
| Concept | Description |
|---|---|
| Model Deployment | Instance of GPT-4, GPT-3.5, embeddings model |
| Tokens | Text chunks (~4 chars); pricing based on token count |
| Temperature | Randomness (0=deterministic, 2=creative) |
| Max Tokens | Response length limit |
| Embeddings | Vector representation of text for semantic search |
Step-by-Step Guide
Step 1: Provision Azure OpenAI Resource
az cognitiveservices account create \
--name contoso-openai \
--resource-group rg-ai \
--kind OpenAI \
--sku S0 \
--location eastus
# Get endpoint and keys
az cognitiveservices account show \
--name contoso-openai \
--resource-group rg-ai \
--query properties.endpoint
az cognitiveservices account keys list \
--name contoso-openai \
--resource-group rg-ai
Step 2: Deploy GPT-4 Model
Azure Portal:
- Navigate to Azure OpenAI Studio
- Deployments → Create new deployment
- Select model:
gpt-4orgpt-4-32k - Name:
gpt4-deployment - Deploy
REST API:
curl -X POST "https://contoso-openai.openai.azure.com/openai/deployments?api-version=2023-05-15" \
-H "api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"scale_settings": {
"scale_type": "Standard"
}
}'
Step 3: Generate Completions (Python)
import openai
openai.api_type = "azure"
openai.api_base = "https://contoso-openai.openai.azure.com/"
openai.api_version = "2023-05-15"
openai.api_key = "YOUR_API_KEY"
response = openai.ChatCompletion.create(
engine="gpt4-deployment",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain Azure Functions in 100 words."}
],
temperature=0.7,
max_tokens=150,
top_p=0.95
)
print(response['choices'][0]['message']['content'])
Step 4: Prompt Engineering Best Practices
System Prompt (Context Setting):
system_prompt = """
You are an expert Azure architect. Provide concise, accurate answers with code examples when relevant.
Format responses in Markdown. Include links to Microsoft Learn documentation.
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "How do I secure Azure Functions?"}
]
Few-Shot Learning:
messages = [
{"role": "system", "content": "Classify support tickets by priority."},
{"role": "user", "content": "Server is down"},
{"role": "assistant", "content": "Priority: High"},
{"role": "user", "content": "Password reset request"},
{"role": "assistant", "content": "Priority: Medium"},
{"role": "user", "content": "Feature suggestion"},
{"role": "assistant", "content": "Priority: Low"},
{"role": "user", "content": "Database corruption detected"}
]
Step 5: Embeddings for Semantic Search
Generate Embeddings:
def get_embedding(text):
response = openai.Embedding.create(
engine="text-embedding-ada-002",
input=text
)
return response['data'][0]['embedding']
# Index documents
documents = [
"Azure Functions is a serverless compute service.",
"Logic Apps provide workflow orchestration.",
"App Service hosts web applications."
]
embeddings = [get_embedding(doc) for doc in documents]
Semantic Search with Cosine Similarity:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
def search(query, documents, embeddings, top_k=3):
query_embedding = get_embedding(query)
similarities = cosine_similarity([query_embedding], embeddings)[0]
top_indices = np.argsort(similarities)[::-1][:top_k]
return [(documents[i], similarities[i]) for i in top_indices]
results = search("How to run code without servers?", documents, embeddings)
for doc, score in results:
print(f"{score:.3f}: {doc}")
Step 6: RAG (Retrieval-Augmented Generation)
Combine Search + GPT-4:
def rag_query(user_query, knowledge_base, embeddings):
# Step 1: Retrieve relevant context
top_docs = search(user_query, knowledge_base, embeddings, top_k=3)
context = "\n".join([doc for doc, _ in top_docs])
# Step 2: Generate answer with context
messages = [
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": user_query}
]
response = openai.ChatCompletion.create(
engine="gpt4-deployment",
messages=messages,
temperature=0.3
)
return response['choices'][0]['message']['content']
answer = rag_query("What is Azure Functions?", documents, embeddings)
print(answer)
Step 7: Content Filtering & Responsible AI
Enable Content Filters:
response = openai.ChatCompletion.create(
engine="gpt4-deployment",
messages=[...],
content_filter={
"hate": {"severity": "medium"},
"self_harm": {"severity": "medium"},
"sexual": {"severity": "medium"},
"violence": {"severity": "medium"}
}
)
# Check if content was filtered
if 'content_filter_results' in response['choices'][0]:
print("Content filtered:", response['choices'][0]['content_filter_results'])
Implement Guardrails:
def validate_input(user_input):
# Check for PII patterns
import re
if re.search(r'\b\d{3}-\d{2}-\d{4}\b', user_input): # SSN pattern
raise ValueError("Input contains sensitive data")
# Check input length
if len(user_input) > 4000:
raise ValueError("Input too long")
return True
def safe_completion(user_input):
validate_input(user_input)
return openai.ChatCompletion.create(...)
Step 8: Function Calling
Define Functions:
functions = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
]
response = openai.ChatCompletion.create(
engine="gpt4-deployment",
messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
functions=functions,
function_call="auto"
)
# If GPT decides to call function
if response['choices'][0]['finish_reason'] == 'function_call':
function_name = response['choices'][0]['message']['function_call']['name']
arguments = json.loads(response['choices'][0]['message']['function_call']['arguments'])
# Execute function
weather_data = get_weather(arguments['location'])
# Send result back to GPT
messages = [
{"role": "user", "content": "What's the weather in Seattle?"},
response['choices'][0]['message'],
{"role": "function", "name": function_name, "content": json.dumps(weather_data)}
]
final_response = openai.ChatCompletion.create(
engine="gpt4-deployment",
messages=messages
)
Advanced Patterns
Streaming Responses
response = openai.ChatCompletion.create(
engine="gpt4-deployment",
messages=[...],
stream=True
)
for chunk in response:
if 'content' in chunk['choices'][0]['delta']:
print(chunk['choices'][0]['delta']['content'], end='')
Token Management
import tiktoken
def count_tokens(text, model="gpt-4"):
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# Truncate to fit token limit
def truncate_text(text, max_tokens=3000):
tokens = tiktoken.encoding_for_model("gpt-4").encode(text)
if len(tokens) > max_tokens:
return tiktoken.encoding_for_model("gpt-4").decode(tokens[:max_tokens])
return text
Conversation Memory
class ConversationHistory:
def __init__(self, max_tokens=4000):
self.messages = []
self.max_tokens = max_tokens
def add_message(self, role, content):
self.messages.append({"role": role, "content": content})
self._trim_history()
def _trim_history(self):
total_tokens = sum(count_tokens(m['content']) for m in self.messages)
while total_tokens > self.max_tokens and len(self.messages) > 1:
self.messages.pop(1) # Keep system message
total_tokens = sum(count_tokens(m['content']) for m in self.messages)
Cost Optimization
| Strategy | Description |
|---|---|
| Cache responses | Store common queries |
| Use GPT-3.5-turbo | 10x cheaper than GPT-4 for simple tasks |
| Limit max_tokens | Set realistic response length |
| Batch requests | Process multiple queries in parallel |
| Monitor usage | Track token consumption via Azure Monitor |
Monitor Usage:
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where OperationName == "ChatCompletions_Create"
| extend Tokens = toint(properties_s.total_tokens)
| summarize TotalTokens = sum(Tokens), RequestCount = count() by bin(TimeGenerated, 1h)
Security Best Practices
- Store API keys in Azure Key Vault
- Use managed identities for authentication
- Implement rate limiting
- Enable Azure Private Link for network isolation
- Log all requests for audit
Troubleshooting
Issue: 429 Too Many Requests
Solution: Implement exponential backoff; request quota increase
Issue: Model not available in region
Solution: Check regional availability; use alternative region
Issue: High token costs
Solution: Use GPT-3.5 for simpler tasks; cache responses; truncate prompts
Best Practices
- Start with low temperature (0.3) for factual tasks
- Use system prompts to set context and constraints
- Implement content filtering for user-generated prompts
- Monitor token usage and set budgets
- Test prompts in Azure OpenAI Studio playground
Key Takeaways
- Azure OpenAI brings GPT-4 with enterprise security.
- Embeddings enable semantic search and RAG patterns.
- Prompt engineering significantly impacts output quality.
- Function calling enables agent-like behavior.
Next Steps
- Integrate with Azure Cognitive Search for hybrid search
- Build chatbot with Bot Framework + Azure OpenAI
- Implement fine-tuning for domain-specific tasks
Additional Resources
What intelligent feature will you build first?