Azure Blob Storage - Enterprise Data Management
Why Blob Storage?
Azure Blob Storage is Microsoft's object storage solution for the cloud:
- Unlimited capacity - Store petabytes of data
- Cost-effective - Tiered storage from hot to archive
- Secure - Encryption, access controls, RBAC
- Accessible - REST APIs, SDKs, mounting as drives
Perfect for:
- Backup and disaster recovery
- Data lakes and analytics
- Media storage (images, videos, documents)
- Log and telemetry data
Understanding Blob Storage Architecture
Storage Account Types
┌─────────────────────────────────────────────────────────────────────────────┐
│ Blob Storage Types │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ General Purpose v2 (GPv2) │
├─────────────────────────────────────────────────────────────────────────────┤
│ • Standard performance │
│ • All access tiers (Hot, Cool, Cold, Archive) │
│ • Supports all blob types (Block, Page, Append) │
│ • Best for: Most scenarios, cost-effective storage │
│ │
│ Pricing: $0.0184/GB (Hot) → $0.0001/GB (Archive) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ Block Blob Storage │
├─────────────────────────────────────────────────────────────────────────────┤
│ • Premium performance (SSD) │
│ • Hot access tier only │
│ • Optimized for high throughput workloads │
│ • Best for: Analytics, AI/ML, real-time processing │
│ │
│ Pricing: ~$0.15/GB (much faster, more expensive) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ Azure Data Lake Storage Gen2 │
├─────────────────────────────────────────────────────────────────────────────┤
│ • Hierarchical namespace (like file systems) │
│ • Hadoop-compatible │
│ • Optimized for big data analytics │
│ • Best for: Spark, Databricks, Synapse, ADLA │
│ │
│ Same as GPv2 but with HNS enabled │
└─────────────────────────────────────────────────────────────────────────────┘
Blob Types
| Type | Best For | Max Size | Use Cases |
|---|---|---|---|
| Block Blobs | Structured files | 190.7 TB | Images, videos, documents, backups |
| Page Blobs | Random access | 8 TB | VHDs, disk storage, page-aligned data |
| Append Blobs | Logging | 195 GB | Log files, audit trails, streaming |
Step 1: Storing and Retrieving Data
Using the Azure SDK
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
public class BlobStorageService
{
private readonly BlobServiceClient _blobServiceClient;
private readonly ILogger<BlobStorageService> _logger;
public BlobStorageService(
BlobServiceClient blobServiceClient,
ILogger<BlobStorageService> logger)
{
_blobServiceClient = blobServiceClient;
_logger = logger;
}
public async Task<string> UploadFileAsync(
string containerName,
string fileName,
Stream content,
BlobUploadOptions options = null)
{
// Get container client
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
// Ensure container exists
await containerClient.CreateIfNotExistsAsync(PublicAccessType.None);
// Get blob client
var blobClient = containerClient.GetBlobClient(fileName);
// Upload with options
var response = await blobClient.UploadAsync(
content,
options ?? new BlobUploadOptions
{
// Set content type based on file extension
HttpHeaders = new BlobHttpHeaders
{
ContentType = GetContentType(fileName)
},
// Set metadata
Metadata = new Dictionary<string, string>
{
["uploaded-by"] = "system",
["uploaded-date"] = DateTime.UtcNow.ToString("O")
}
});
_logger.LogInformation("Uploaded blob: {BlobName}", blobClient.Uri);
return blobClient.Uri.ToString();
}
public async Task<Stream> DownloadFileAsync(
string containerName,
string fileName)
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(fileName);
var response = await blobClient.DownloadStreamAsync();
return response.Value;
}
public async Task<List<string>> ListBlobsAsync(
string containerName,
string prefix = null)
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var blobs = new List<string>();
await foreach (var blobItem in containerClient.GetBlobsAsync(
prefix: prefix))
{
blobs.Add(blobItem.Name);
}
return blobs;
}
private string GetContentType(string fileName)
{
var extension = Path.GetExtension(fileName).ToLowerInvariant();
return extension switch
{
".jpg" or ".jpeg" => "image/jpeg",
".png" => "image/png",
".gif" => "image/gif",
".pdf" => "application/pdf",
".json" => "application/json",
".xml" => "application/xml",
".txt" => "text/plain",
".html" => "text/html",
".css" => "text/css",
".js" => "application/javascript",
_ => "application/octet-stream"
};
}
}
Using Managed Identity (Recommended)
// Program.cs - Configure with managed identity
builder.Services.AddSingleton<BlobServiceClient>(sp =>
{
var configuration = sp.GetRequiredService<IConfiguration>();
var blobServiceUri = new Uri($"https://{configuration["Storage:AccountName"]}.blob.core.windows.net");
return new BlobServiceClient(
blobServiceUri,
new DefaultAzureCredential()); // Uses managed identity
});
Step 2: Access Tiers and Cost Optimization
Understanding Access Tiers
┌─────────────────────────────────────────────────────────────────────────────┐
│ Storage Tiers Comparison │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────┬─────────────┬────────────────────────────────────────────────┐
│ Tier │ Cost/GB │ Best For │
├─────────────┼─────────────┼────────────────────────────────────────────────┤
│ Hot │ ~$0.018 │ Frequently accessed data, real-time processing │
│ │ │ Daily operations, active workloads │
├─────────────┼─────────────┼────────────────────────────────────────────────┤
│ Cool │ ~$0.009 │ 30+ days of infrequent access │
│ │ │ Backups, analytics data, seasonal data │
├─────────────┼─────────────┼────────────────────────────────────────────────┤
│ Cold │ ~$0.004 │ 90+ days of infrequent access │
│ │ │ Archives, compliance data, long-term storage │
├─────────────┼─────────────┼────────────────────────────────────────────────┤
│ Archive │ ~$0.0001 │ 180+ days, rarely accessed │
│ │ │ Legal archives, compliance, long-term backups │
│ │ │ Rehydration takes 1-5 hours to read! │
└─────────────┴─────────────┴────────────────────────────────────────────────┘
Setting Access Tiers
public class TierManagementService
{
private readonly BlobServiceClient _blobServiceClient;
public async Task SetAccessTierAsync(
string containerName,
string blobName,
AccessTier tier)
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(blobName);
await blobClient.SetAccessTierAsync(tier);
_logger.LogInformation("Set {Blob} access tier to {Tier}", blobName, tier);
}
public async Task MoveToArchiveAsync(string containerName, string blobName)
{
await SetAccessTierAsync(containerName, blobName, AccessTier.Archive);
}
public async Task RehydrateFromArchiveAsync(string containerName, string blobName)
{
// Rehydration takes 1-5 hours
await SetAccessTierAsync(containerName, blobName, AccessTier.Cool);
}
}
Lifecycle Management Policies
{
"rules": [
{
"name": "aging-rule",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": ["blockBlob", "appendBlob"],
"prefixMatch": ["container1/logs", "container2/backups"]
},
"actions": {
"baseBlob": {
"tierToCool": {"daysAfterModificationGreaterThan": 30},
"tierToArchive": {"daysAfterModificationGreaterThan": 90},
"delete": {"daysAfterModificationGreaterThan": 365}
},
"snapshot": {
"delete": {"daysAfterCreationGreaterThan": 90}
}
}
}
},
{
"name": "archive-immediately-rule",
"enabled": true,
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["archive/important-files"]
},
"actions": {
"baseBlob": {
"tierToArchive": {"daysAfterModificationGreaterThan": 1},
"delete": {"daysAfterModificationGreaterThan": 730}
}
}
}
}
]
}
// Apply lifecycle policy via SDK
public async Task ApplyLifecyclePolicyAsync(string policyJson)
{
var account = _blobServiceClient.GetProperties();
account.Value.StorageSkuName = SkuName.StandardRAGRS;
var policy = BinaryData.FromString(policyJson);
await _blobServiceClient.SetInventoryPolicyAsync(policy);
}
Step 3: Security and Access Control
SAS Tokens (Shared Access Signatures)
public class SasTokenGenerator
{
public string GenerateSasToken(
string containerName,
string blobName,
TimeSpan expiry,
bool allowRead = true,
bool allowWrite = false,
bool allowDelete = false)
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(blobName);
// Create SAS token
var sasBuilder = new BlobSasBuilder
{
BlobContainerName = containerName,
BlobName = blobName,
Resource = "b", // blob
ExpiresOn = DateTimeOffset.UtcNow.Add(expiry)
};
if (allowRead) sasBuilder.Permissions |= BlobSasPermissions.Read;
if (allowWrite) sasBuilder.Permissions |= BlobSasPermissions.Write;
if (allowDelete) sasBuilder.Permissions |= BlobSasPermissions.Delete;
// Generate SAS token
var sasToken = sasBuilder.ToSasQueryParameters(
new StorageSharedKeyCredential(
_blobServiceClient.AccountName,
"your-account-key"));
return sasToken.ToString();
}
// Time-limited read-only access for users
public string GenerateReadToken(string containerName, string blobName)
{
return GenerateSasToken(containerName, blobName,
expiry: TimeSpan.FromHours(1), // 1 hour validity
allowRead: true);
}
// Temporary upload access
public string GenerateUploadToken(string containerName, TimeSpan expiry)
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var sasBuilder = new BlobSasBuilder
{
BlobContainerName = containerName,
Resource = "c", // container
ExpiresOn = DateTimeOffset.UtcNow.Add(expiry),
Permissions = BlobSasPermissions.Write | BlobSasPermissions.Create
};
return sasBuilder.ToSasQueryParameters(
new StorageSharedKeyCredential(
_blobServiceClient.AccountName,
"your-account-key")).ToString();
}
}
Role-Based Access Control (RBAC)
{
"roleAssignments": [
{
"roleDefinitionId": "Storage Blob Data Reader",
"principalId": "user-oid",
"scope": "/subscriptions/.../resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/myaccount"
},
{
"roleDefinitionId": "Storage Blob Data Contributor",
"principalId": "app-managed-identity",
"scope": "/subscriptions/.../resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/myaccount/blobServices/default/containers/mydata"
}
]
}
// Using RBAC with managed identity
public async Task<RbacBasedAccessService>()
{
// Assign role via Azure SDK
var roleAssignment = await _roleAssignments.CreateAsync(
scope: containerClient.Uri,
roleDefinitionId: "2a2bacc8-36ae-10d9-9a8a-5d3fd8a57d8b", // Storage Blob Data Reader
principalId: "managed-identity-client-id",
roleAssignmentProperties: new RoleAssignmentProperties(
new ServicePrincipal("managed-identity")));
}
Encryption
// Encryption is automatic - all data is encrypted at rest
// Options for additional security:
// 1. Customer-managed keys (CMK)
var blobServiceClient = new BlobServiceClient(
new Uri("https://account.blob.core.windows.net"),
new DefaultAzureCredential(),
new BlobClientOptions
{
CustomerProvidedKey = new CustomerProvidedKey(
key: Convert.FromBase64String("your-key"))
});
// 2. Enable secure transfer (HTTPS only)
var connectionString = "DefaultEndpointsProtocol=https;...";
// 3. Virtual network endpoints
// Configure in Azure Portal -> Networking -> Virtual networks
Step 4: Performance Optimization
Concurrent Upload
public class ParallelUploadService
{
public async Task UploadLargeFileAsync(
string containerName,
string blobName,
string filePath,
int maxBlockSize = 4 * 1024 * 1024) // 4MB default
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlockBlobClient(blobName);
var fileInfo = new FileInfo(filePath);
var fileSize = fileInfo.Length;
// Calculate number of blocks
var blockCount = (int)Math.Ceiling((double)fileSize / maxBlockSize);
var blockIds = new List<string>();
using var fileStream = File.OpenRead(filePath);
var buffer = new byte[maxBlockSize];
for (int i = 0; i < blockCount; i++)
{
// Read block
var bytesRead = fileStream.Read(buffer, 0, maxBlockSize);
var block = new byte[bytesRead];
Array.Copy(buffer, block, bytesRead);
// Generate block ID (must be base64, same length)
var blockId = Convert.ToBase64String(
Encoding.UTF8.GetBytes(
i.ToString("D10")));
// Upload block
using var blockStream = new MemoryStream(block);
await blobClient.StageBlockAsync(blockId, blockStream);
blockIds.Add(blockId);
_logger.LogInformation("Uploaded block {BlockNumber}/{TotalBlocks}",
i + 1, blockCount);
}
// Commit all blocks
await blobClient.CommitBlockListAsync(blockIds);
_logger.LogInformation("Uploaded {FileName} ({Size} MB)",
blobName, fileSize / (1024 * 1024));
}
}
Parallel Download with Retry
public async Task DownloadWithRetryAsync(
string containerName,
string blobName,
string outputPath,
int maxRetries = 3)
{
for (int attempt = 1; attempt <= maxRetries; attempt++)
{
try
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
var blobClient = containerClient.GetBlobClient(blobName);
var response = await blobClient.DownloadToAsync(outputPath);
return;
}
catch (Exception ex) when (attempt < maxRetries)
{
_logger.LogWarning(ex,
"Download attempt {Attempt} failed, retrying...",
attempt);
await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
}
}
throw new Exception($"Failed after {maxRetries} attempts");
}
Using AzCopy for Large Data Transfers
# Upload entire folder
azcopy copy "C:\local\data" "https://mystorageaccount.blob.core.windows.net/container" \
--recursive
# Download with specific number of concurrent workers
azcopy copy "https://mystorageaccount.blob.core.windows.net/container" "C:\local\data" \
--recursive \
--parallel-level 8
# Sync source to destination (one-way sync)
azcopy sync "https://mystorageaccount.blob.core.windows.net/container1" \
"https://mystorageaccount.blob.core.windows.net/container2" \
--delete-destination true
# Transfer with specific speed limit (in MiB/s)
azcopy copy "source" "destination" --max-bandwidth 50
Step 5: Monitoring and Diagnostics
Metrics and Alerts
public class StorageMonitoringService
{
public async Task<StorageMetrics> GetMetricsAsync(string containerName)
{
var containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
// Get blob service stats (not real-time metrics)
var properties = await containerClient.GetPropertiesAsync();
return new StorageMetrics
{
TotalBlobs = properties.Value.BlobCount,
TotalBytes = properties.Value.Bytes,
LastModified = properties.Value.LastModified,
AccessTier = properties.Value.AccessTier
};
}
// Monitor via Azure Monitor
// Configure in Azure Portal -> Monitoring -> Metrics
// Key metrics to track:
// - Transactions (count)
// - Ingress/Egress (GB)
// - Blob Capacity (GB)
// - Blob Count
// - Average E2E Latency (ms)
}
public class StorageMetrics
{
public long TotalBlobs { get; set; }
public long TotalBytes { get; set; }
public DateTimeOffset LastModified { get; set; }
public string AccessTier { get; set; }
}
Logging
// Enable Azure Storage logging via diagnostic settings
// In Azure Portal: Storage Account -> Monitoring -> Diagnostic settings
// Enable:
// - StorageRead (all read operations)
// - StorageWrite (all write operations)
// - StorageDelete (all delete operations)
// Send to: Log Analytics, Storage account, or Event Hub
// Query logs in Log Analytics
/*
StorageBlobLogs
| where TimeGenerated > ago(1h)
| where OperationName == "PutBlob"
| where StatusCode == 201
| summarize count() by bin(TimeGenerated, 5m)
*/
Best Practices Summary
| Practice | Why | Implementation |
|---|---|---|
| Use appropriate tier | Cost optimization | Hot → Cool → Cold → Archive based on access |
| Lifecycle policies | Automate tier changes | Set and forget tier transitions |
| Managed identity | Security, no key management | Use DefaultAzureCredential |
| Use CDN | Faster access globally | Front door or CDN integration |
| Monitor costs | Track spending | Set budget alerts in Azure |
| Enable soft delete | Protect against accidental deletion | 7-30 days retention |
Conclusion
Azure Blob Storage provides:
- Scalability - Store unlimited data at low cost
- Flexibility - Multiple tiers for different needs
- Security - Encryption, RBAC, SAS tokens
- Performance - High throughput, parallel operations
- Integration - Works with Azure services seamlessly
Key takeaways:
- Choose the right storage account type for your workload
- Use lifecycle policies to automatically move data to cheaper tiers
- Use managed identity instead of connection strings
- Monitor usage and set budget alerts
Azure Integration Hub - Blob Storage