Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TUS upload support (WIP) #65

Closed
2 tasks
felix-schwarz opened this issue Apr 23, 2020 · 4 comments
Closed
2 tasks

TUS upload support (WIP) #65

felix-schwarz opened this issue Apr 23, 2020 · 4 comments
Assignees

Comments

@felix-schwarz
Copy link
Collaborator

felix-schwarz commented Apr 23, 2020

The SDK should provide support for uploading files via the TUS protocol.

Notable observations from reading the spec:

1. Background NSURLSession friendly

Requests / PATCH

The Server SHOULD always attempt to store as much of the received data as possible.

If the server does store as much of the received data as possible, the SDK has an easier time to comply with requirements for NSURLSession background queues and avoiding penalties:

  • it can send the entire file in one request
  • if the connection is interrupted:
    • the server (ideally) keeps the received data
    • the client uses a HEAD request to retrieve the offset to resume from
    • it sends the rest of the file from that offset in a single request
  • this way, consecutive requests in close timely proximity - something that is penalized by NSURLSession with long delays - can be mostly avoided

2. Schedulable

Protocol Extensions / Creation

The Client and the Server SHOULD implement the upload creation extension.

It would be preferable if clients were able to directly start an upload with a single request:

  • upload can be scheduled in a background NSURLSession as a single request if Creation With Upload extension is implemented
  • plain Creation extension would require additional request to "register" the upload first:
    • requires the app to be active again before the actual upload can be scheduled
    • could be seen as pattern that's penalized by NSURLSession with long delays

3. Defined expiration

Protocol Extensions / Upload-Expires

The `Upload-Expires response header indicates the time after which the unfinished upload expires. A Server MAY wish to remove incomplete uploads after a given period of time to prevent abandoned uploads from taking up extra storage. The Client SHOULD use this header to determine if an upload is still valid before attempting to resume the upload.

This helps in determining whether an upload should be continued or not - and resume only uploads that are known to still be around. On the other hand, a HEAD request would be required anyway before resuming an upload, at which point an expired upload should also become apparent.

If the expiration date should be supported and utilized, though, adding support for expiration directly to the OCHTTP system should be considered, with requests being terminated with a new error code in case they have expired before having been scheduled.

4. Checksum troubles

The Checksum extension needs checksums to be provided on a per-request basis, calculated not over the entire file but over the body of the respective upload requests.

This is generally fine, but does not cover the scenario where an upload is interrupted and the server should use the already received bytes:

  • the checksum over the partial upload will differ:
    • will the server reject all data received with that request over this?
    • if the server accepts the data, how can overall file consistency be verified

Possible solutions:

1. Store checksums, check when upload is complete

The specification provides this hint:

Once the entire request has been received, the Server MUST verify the uploaded chunk against the provided checksum using the specified algorithm.

The solution therefore could be to store all checksums and only verify them against the respective parts once they have been received in full.

Drawback:

  • if an upload is resumed, the ranges to compute checksums on could overlap, generating unnecessary load.

2. Custom header with the full file checksum

A custom header with the full file checksum (f.ex. OC-Full-Upload-Checksum) is passed to the Create With Upload extension when the upload is initiated. That would allow verification of the full file once the upload has completed.

Drawback:

  • a corrupted upload will only be detected once the entire file has been transfered

3. Custom header with the checksum over already transferred data

An additional, custom header with the checksum of the file up to the point the upload resumes from (f.ex. OC-Transmitted-Upload-Checksum) would allow the server to check if the data received before is consistent - and allow the server to cancel an already.

Drawback:

  • parts of the file could be checksummed multiple times, generating unnecessary load

Drawback mitigation:

  • that drawback could be mitigated by the server returning a byte range with HEAD requests, for which the client should provide a checksum when resuming the upload – allowing the server to accept partial requests while ensuring consistency

Pragmatic and performant

A pragmatic and performance-oriented approach would likely by a combination of 1. and 3.

Related issues

Known issues

The current implementation in develop has the following known issues:

  • uploads may sometimes not resume and "hang" indefinitely if the app is force-terminated. To test resuming, log out and back into an account.
  • no progress is reported for uploads
@felix-schwarz felix-schwarz self-assigned this Apr 23, 2020
@butonic
Copy link
Member

butonic commented Apr 28, 2020

1. core protocol

  • is implemented ✔️

2. creation-with-upload extension

We can implement this on several endpoints:

  • The dataprovider in reva
  • The folders in ocdav

To directly start an upload at the dataprovider clients would have to send the POST to the datagateway at /data because it needs to look up which dataprovider is actually responsible for the upload.

To implement the create extension we already do the necessary stat and InitiateFileUpload requests in ocdav and itr should be possiple to support the create-with-uploadextension as well.

The js tus client tries to send data with the initial request and falls back to just create if the server does not support it. So we should implement it asap. This is an easy win and halves thke number of requests.

3. expiration extension

should be implemented, but we need to agree on default expiry ... 24h?

4. checksum extension

The checksum extension only really makes sense when using multiple PATCH requests or when implementing and using the concatenation as well, because a failed PATCH request MUST discard the received bytes for the request. We can either implement Add Tus-Min/Max-Chunk-Size headers whis is not yet official but absolutely makes sense, because it would allow the admin to configure a sena chunk size for his deployment. Clients could then send the chunks using PATCH requests with a checksum per request. To consume all of the bandwith the concatenation extension can be used to allow uploading the chunks in parallel.

To mimic OC10 checksumming, I allow providing a checksum in the metadata that can be checked on the server side if the full file is available.

I prefer using Tus-Min/Max-Chunk-Size headers: it allows resuming and can be tuned by the admin as necessary. No urgent need to go parallel & concatenation right now ... someday, maybe.

If we went to invent a new checksumming extension we should create that as a PR in https://github.com/tus/tus-resumable-upload-protocol.

@michaelstingl
Copy link
Contributor

3. expiration extension

should be implemented, but we need to agree on default expiry ... 24h?

Okay. Same as previous chunking implementations.

@michaelstingl
Copy link
Contributor

post with metadata currently uses metadata to send the checksum: https://github.com/cs3org/reva/pull/674/files#diff-5fec0456a6ea9fb1227335fc8d3f8cfdR150

@felix-schwarz
Copy link
Collaborator Author

felix-schwarz commented May 27, 2020

TUS support development has moved to develop #61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants