There are two SQL tables without FK but have one common property which I use to join them into a list where I get 1 300 000 records. That column(property) is not unique inside both tables (can occur more than once). I need to insert those 1 300 000 records from that joined memory list inside Azure Cosmos DB using v3 SDK and using a bulk insert for that purpose.
Problem is that this bulk insert lasts for more than 10-15 minutes even though I have set the throughput 50 000 or 100 000. I have tried few approaches and none of them gave me better results. The only time when I got it working was when I joined both tables with LINQ into the parent-children list then I got around 2900 documents and bulk insert was around 33 seconds but then the size of the documents even though I optimized JSON serialization (shortening strings, dates, ignoring null values, etc.) was for some cases more than 2MB.
When inserting 1 300 000 documents with a throughput of 100 000(testing) I also get Service Unavailable or TooManyRequest exceptions.
I am using an emulator and I am new to azure cosmos.
Creating Singleton instance:
serviceCollection.AddSingleton(x =>
{
var primaryConnectionString = _solutionConfiguration.AzureCosmosDb.PrimaryConnectionString;
if (string.IsNullOrEmpty(primaryConnectionString))
{
throw new InvalidOperationException(
"Please specify a valid CosmosDBConnection in the solutionsettings.json file or your Azure Functions Settings.");
}
return new CosmosClientBuilder(primaryConnectionString)
.WithSerializerOptions(new CosmosSerializationOptions() { PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase, IgnoreNullValues = true })
.WithRequestTimeout(TimeSpan.FromMinutes(2))
.WithBulkExecution(true).Build();
});
Container creation:
public async static Task CreatContainerForDatabase(CosmosClient cosmosClient, string databaseId, string containerId, string partitionKey)
{
Console.WriteLine(">>>>> Create New Container >>>>>");
var containerProperties = new ContainerProperties
{
Id = containerId,
PartitionKeyPath = partitionKey
};
var result = await cosmosClient.GetDatabase(databaseId).CreateContainerAsync(containerProperties, 150000);
Console.WriteLine($"New container with id: {result.Container.Id} has been created");
}
Insert to Cosmos db
Console.WriteLine("Bulk insert!");
var cost = 0D;
var errors = 0;
var documentsToCreate = boatsWithPricesFirstYear.Count();
var executionStarted = DateTime.Now;
var container = _cosmosClient.GetContainer(_configuration.AzureCosmosDb.DatabaseId, _configuration.AzureCosmosDb.ContainerId);
var tasks = new List<Task>(documentsToCreate);
foreach (var boat in boatsWithPricesFirstYear)
{
var result = container.CreateItemAsync(boat, new PartitionKey(boat.BoatId));
tasks.Add(result.ContinueWith(t =>
{
if (t.Status == TaskStatus.RanToCompletion)
{
cost += t.Result.RequestCharge;
}
else
{
Console.WriteLine($"Error creating boat with id: {boat.BoatId} document: {t.Exception.Message}");
errors++;
}
}));
}
await Task.WhenAll(tasks);
Console.WriteLine($"{documentsToCreate - errors} documents has been created at cost of: {cost:0.##} RUs in {DateTime.Now.Subtract(executionStarted)}");
Console.ReadLine();
Also when I console log Request charge (RU/s) I get around 9 million RU/s for all (1 300 000) documents is there any way to reduce this charge when inserting items (I am aware that SQL API for writing purposes takes around 10.2 RU/s for one item if I am right).
When I make this list which I am trying to insert in cosmos as a parent-children list (boat->parent, prices for every day->children) I get around 2900 documents with a request charge of 850 000 RU/s.
I have tried Microsoft example as well with stream https://docs.microsoft.com/en-us/azure/cosmos-db/tutorial-sql-api-dotnet-bulk-import but I got almost the same results.
If I lack some information I will update this post. Thanks in advance.
P.S. My local memory is also dying when bulk insert is in process.
question from:
https://stackoverflow.com/questions/66061811/azure-cosmos-db-sql-api-sdk-bulk-insert-optimization-and-performance 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…