File limit on S3 equal to 1,000 (one thousand)

Is there a file limitation on S3 above version 2.3.0?

 I was using the localstack/localstack:2.3.0 version of the docker hub image and with this image I could upload thousands of files to S3.

 When upgrading to any version above 2.3.0 only 1,000 files per key are displayed.

Hello,

Are you talking about ListoObjects or ListObjectsV2 operations? Those operations are paginated, and will return a maximum of 1000 objects per operation, like AWS. You need to use the pagination of those operations to get the rest of your keys.

See:

Listing the objects both using Boto3 with both operations using paging, and via the AWS CLI.

It simply does not record more than 1,000 files when using any version above 2.3.0, including 3.0.

When I use version 2.3.0 I can record thousands of files and list these files using Boto3 and AWS CLI without problems.

I have run a lot of benchmarks using S3 with more than 10 thousands objects with no issues, so I’m not quite sure what is the problem here.

I just ran this small test and it worked with LocalStack 3.0.0.

import boto3
from mypy_boto3_s3 import S3Client

s3: S3Client = boto3.client("s3", endpoint_url=f"http://localhost:4566")

s3.create_bucket(Bucket="test")

for i in range(3000):
    # padding the object for lex. ordering
    s3.put_object(Bucket="test", Key=f"object_{i:04}", Body="test")


list_1 = s3.list_objects_v2(Bucket="test")
continuation_token = list_1["NextContinuationToken"]
print("Print first object list 1", list_1["Contents"][0])
print("Print first object list 1", list_1["Contents"][-1])


list_2 = s3.list_objects_v2(Bucket="test", ContinuationToken=continuation_token)
continuation_token = list_2["NextContinuationToken"]
print("Print first object list 2", list_2["Contents"][0])
print("Print first object list 2", list_2["Contents"][-1])

list_3 = s3.list_objects_v2(Bucket="test", ContinuationToken=continuation_token)
print("Print first object list 3", list_3["Contents"][0])
print("Print first object list 3", list_3["Contents"][-1])

# using built-in paginator
paginator = s3.get_paginator("list_objects_v2")

result = paginator.paginate(Bucket="test").build_full_result()
print("printing amount of objects in bucket", len(result["Contents"]))

This is printing:

Print first object list 1 {'Key': 'object_0000', 'LastModified': datetime.datetime(2023, 11, 18, 22, 17, 51, tzinfo=tzutc()), 'ETag': '"098f6bcd4621d373cade4e832627b4f6"', 'Size': 4, 'StorageClass': 'STANDARD'}
Print first object list 1 {'Key': 'object_0999', 'LastModified': datetime.datetime(2023, 11, 18, 22, 17, 53, tzinfo=tzutc()), 'ETag': '"098f6bcd4621d373cade4e832627b4f6"', 'Size': 4, 'StorageClass': 'STANDARD'}
Print first object list 2 {'Key': 'object_1000', 'LastModified': datetime.datetime(2023, 11, 18, 22, 17, 53, tzinfo=tzutc()), 'ETag': '"098f6bcd4621d373cade4e832627b4f6"', 'Size': 4, 'StorageClass': 'STANDARD'}
Print first object list 2 {'Key': 'object_1999', 'LastModified': datetime.datetime(2023, 11, 18, 22, 17, 55, tzinfo=tzutc()), 'ETag': '"098f6bcd4621d373cade4e832627b4f6"', 'Size': 4, 'StorageClass': 'STANDARD'}
Print first object list 3 {'Key': 'object_2000', 'LastModified': datetime.datetime(2023, 11, 18, 22, 17, 55, tzinfo=tzutc()), 'ETag': '"098f6bcd4621d373cade4e832627b4f6"', 'Size': 4, 'StorageClass': 'STANDARD'}
Print first object list 3 {'Key': 'object_2999', 'LastModified': datetime.datetime(2023, 11, 18, 22, 17, 57, tzinfo=tzutc()), 'ETag': '"098f6bcd4621d373cade4e832627b4f6"', 'Size': 4, 'StorageClass': 'STANDARD'}
printing amount of objects in bucket 3000

Could you share exactly how you are using pagination?

Maybe I didn’t understand what the issue was exactly, what does 1,000 files per key mean? Could you share a small reproducible sample?

Hi @bentsku,

First of all I want to thank you for your help.

I’m sorry, I think I didn’t really express myself well.

When I said “key” I actually meant “folder”.

About the problem, I didn’t test it by uploading the files using Boto3.

I used the AWS CLI, more specifically the command:
$ aws --endpoint http://localhost:4566 s3 sync source_folder s3://bucket/sync_folder

When executing the command, in version 2.3.0 I can upload thousands of files but if the Docker container is in any version later than this, only 1,000 files are written.

Regarding the function that lists files using Boto3, if you still want I can send it so you can see.

I just remember that it works perfectly when, in fact, the files are written correctly to S3 and I can list them through the AWS CLI, using the command:
$ aws --endpoint http://localhost:4566 s3 ls s3://bucket/sync_folder/

Thank you and I apologize again for the confusion.

Hi @bentsku, I’m waiting for the staff member that will review my post.

Hi @gbral, do you mean a staff member from LocalStack?

Staff member.

When I replied to the message, my post was hidden and I received this message:

Akismet has temporarily hidden your post

Alright, thanks for your answer. @HarshCasper would you be able to help here? I’m not sure how to review posts like that.

It should be visible now!

Hi @bentsku,

I think I found the source of the problem, but I still can’t explain exactly why.

When creating the Docker container in version 2.3.0 (version in which I can do “AWS S3 SYNC” with more than 1,000 files and everything works normally) and changing the S3 PROVIDER version to V3 the problem, which naturally occurs in later versions in 2.3, it happens in 2.3 too.

And the opposite is also true. If you change the S3 PROVIDER version to LEGACY_V2 in any version, including 3.0, uploading more than 1,000 files to S3 works normally.

Below is the command line for how the container is being created:

docker run -idt
–restart=unless-stopped
–network=dev
-v /opt/services/localstack/volume:/var/lib/localstack
-v /var/run/docker.sock:/var/run/docker.sock
-v /opt/data:/data
-p 4566:4566
-p 4510-4559:4510-4559
-p 9000:9000
-e DEBUG=1
-e SERVICES=s3
-e PROVIDER_OVERRIDE_S3=legacy_v2
-e DEFAULT_REGION=sa-east-1
-e AWS_ACCESS_KEY_ID=test
-e AWS_SECRET_ACCESS_KEY=test
-e DOCKER_HOST=unix:///var/run/docker.sock
-e VIRTUAL_HOST=localhost.localstack
–name localstack
localstack/localstack:3.0.0

If I remove the line “-e PROVIDER_OVERRIDE_S3=legacy_v2” when running an “AWS S3 SYNC” command with more than 1,000 files it only uploads 1,000.

Hello @gbral and thanks again for the detailed report.

So the issue is with the aws-cli command sync, when trying to upload more than 1000 files. I’ll try to reproduce locally and I will update you here. Thanks!

Hello @gbral

I’m afraid I cannot reproduce the issue, I’ve just synced a folder with around 2200 files and I could see them properly.

I’ll describe exactly what I’ve done:

import os

import boto3
from mypy_boto3_s3 import S3Client

s3: S3Client = boto3.client("s3", endpoint_url=f"http://localhost:4566")

s3.create_bucket(Bucket="test")


filepath = "/tmp/testsync"
os.mkdir(filepath)

for i in range(2200):
    with open(f"{filepath}/test_{i:04}.txt", "w") as fp:
        fp.write("test")

From then, I ran the AWS CLI commands like you:
awslocal s3 sync /tmp/testsync s3://test/test-folder
This printed 2200 upload lines, counted with wc -l

Then I ran the following:
awslocal s3 ls s3://test/test-folder/
And it printed again 2200 lines:

awslocal s3 ls s3://test/test-folder/ | wc -l
     2200

I’m not sure how to reproduce your issue.
Could you also set LS_LOG=trace and share the debug output when you run those command?