Skip to content
This repository was archived by the owner on Feb 13, 2026. It is now read-only.

xl: Add '.CORRUPTED' prefix to corrupted objects#6592

Merged
kannappanr merged 1 commit intominio:masterfrom
vadmeste:issue/6575
Oct 16, 2018
Merged

xl: Add '.CORRUPTED' prefix to corrupted objects#6592
kannappanr merged 1 commit intominio:masterfrom
vadmeste:issue/6575

Conversation

@vadmeste
Copy link
Member

@vadmeste vadmeste commented Oct 8, 2018

Description

getObjectInfo() checks if a given object has enough xl.json in disks, returns 404 if not and added .CORRUPTED suffix to object.

Motivation and Context

Fixes #6575 and #6591

Regression

No

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added unit tests to cover my changes.
  • I have added/updated functional tests in mint. (If yes, add mint PR # here: )
  • All new and existing tests passed.

@vadmeste vadmeste changed the title xl: Make an object present in N/2 disks visible. xl: Make an object visible when present in N/2 disks Oct 8, 2018
@codecov
Copy link

codecov bot commented Oct 8, 2018

Codecov Report

Merging #6592 into master will decrease coverage by 0.63%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6592      +/-   ##
==========================================
- Coverage   55.45%   54.81%   -0.64%     
==========================================
  Files         251      249       -2     
  Lines       38330    31367    -6963     
==========================================
- Hits        21254    17193    -4061     
+ Misses      15124    12211    -2913     
- Partials     1952     1963      +11
Impacted Files Coverage Δ
cmd/xl-v1-object.go 75.86% <33.33%> (-0.79%) ⬇️
cmd/gateway-startup-msg.go 50% <0%> (-11.12%) ⬇️
cmd/posix-errors.go 38.33% <0%> (-11%) ⬇️
cmd/web-router.go 69.69% <0%> (-8.09%) ⬇️
cmd/sts-handlers.go 14.63% <0%> (-8.01%) ⬇️
cmd/fs-v1-helpers.go 63.82% <0%> (-6.21%) ⬇️
cmd/fs-v1-rwpool.go 66.66% <0%> (-5.75%) ⬇️
cmd/retry.go 81.81% <0%> (-5.69%) ⬇️
cmd/config-migrate.go 48.39% <0%> (-4.4%) ⬇️
cmd/bucket-handlers-listobjects.go 42.22% <0%> (-3.45%) ⬇️
... and 121 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21c8693...72e664d. Read the comment docs.

Copy link
Member

@harshavardhana harshavardhana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as agreed and tested.

kannappanr
kannappanr previously approved these changes Oct 8, 2018
Copy link
Contributor

@kannappanr kannappanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kannappanr
Copy link
Contributor

Here is what was discussed on what needs to be done

  1. Object without read quorum will be displayed with a .CORRUPTED extension
  2. User should be able to delete this object using the RemoveObject API
  3. There should be a best effort to be able to download a partial object.

@vadmeste
Copy link
Member Author

vadmeste commented Oct 9, 2018

  1. Object without read quorum will be displayed with a .CORRUPTED extension

What's the real reason for this?

When we will add .CORRUPTED extension ? I suggest that healing should do this, I mean renaming objects that healing was unable to heal to add .CORRUTPED extension.

@harshavardhana
Copy link
Member

Well there are problems adding extensions causes Listing to be all clobbered up. So I think the solution needs more discussion.

@vadmeste
Copy link
Member Author

vadmeste commented Oct 9, 2018

Well there are problems adding extensions causes Listing to be all clobbered up.

yeah I agree, that's why I suggested that healing does the rename of these objects to add .CORRUPTED extension, but also I am not sure why we add that extension in the first place.

@harshavardhana
Copy link
Member

Will explain @vadmeste here in a bit about final approach, just discussed.

Copy link
Member

@harshavardhana harshavardhana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with @abperiasamy the changes here are fine but we also have to make few more changes, i.e we should rename the object which are stale with "objectName.CORRUPTED" atomically.

Also allow them to be available in the listing output such that they can be deleted by mc rm.

What we need to do should be carefully understood here

  • First we read the xl.json from all disks and then ascertain how many disks its missing
  • Then pick one of the common xl.json and decide the quorum.
  • Once the quorum is obtained see how many places the object still exists, rename all of them as objectName.corrupted
  • Recreate empty objects on non-existent disks such as objectName.CORRUPTED with xl.json
  • This would allow for quorum and for the objects to be deleted.

All of this happens inside a ReadLock which is okay since other write locks will be blocked, we don't need to acquire a new write lock. Similar to healing but we are actively doing this during GetObjectInfo()

@vadmeste vadmeste changed the title xl: Make an object visible when present in N/2 disks xl: Add '.CORRUPTED' prefix to corrupted objects Oct 11, 2018
@vadmeste
Copy link
Member Author

  • Recreate empty objects on non-existent disks such as objectName.CORRUPTED with xl.json
  • This would allow for quorum and for the objects to be deleted.

@harshavardhana I think recreating empty object is not needed.

I know that the general rule is that we should not allow deleting an object if there is no write quorum
but I think we can consider errFileNotFound errors reported by some disks as success in case of DELETE. Hence, I think it is okay to allow deleting if the number errs of 'nil' & 'errFileNotFound' >= writeQuorum.

@vadmeste
Copy link
Member Author

I know that the general rule is that we should not allow deleting an object if there is no write quorum
but I think we can consider errFileNotFound errors reported by some disks as success in case of DELETE. Hence, I think it is okay to allow deleting if the number errs of 'nil' & 'errFileNotFound' >= writeQuorum.

By the way, that's how it is working right now. I investigated this before.

@vadmeste vadmeste changed the title xl: Add '.CORRUPTED' prefix to corrupted objects [WIP] xl: Add '.CORRUPTED' prefix to corrupted objects Oct 11, 2018
@vadmeste vadmeste force-pushed the issue/6575 branch 2 times, most recently from 284f30d to 8783b63 Compare October 11, 2018 16:16
Copy link
Member

@harshavardhana harshavardhana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this PR ready?

@minio-ops
Copy link

Error running mint automation
+ mkdir -p 6592-71a59c6/gopath/src/github.com/minio
+ git clone --quiet https://github.com/minio/minio.git 6592-71a59c6/gopath/src/github.com/minio/minio
++ cd 6592-71a59c6/gopath/src/github.com/minio/minio
++ git remote add minio https://github.com/minio/minio.git
++ cd 6592-71a59c6/gopath/src/github.com/minio/minio
++ git fetch --quiet minio pull/6592/head:pr6592
++ cd 6592-71a59c6/gopath/src/github.com/minio/minio
++ git checkout --quiet pr6592
++ minikube ip
+ minikube_ip=192.168.39.156
++ mkdir -p certs
++ cd certs
++ ./cert -host 192.168.39.156
2018/10/13 14:14:12 wrote public.crt
2018/10/13 14:14:12 wrote private.key
+ GOPATH=/mnt/nvme0n1p1/mint-auto/mint-auto/6592-71a59c6/gopath
+ make -C 6592-71a59c6/gopath/src/github.com/minio/minio --quiet
++ minikube docker-env
+ eval 'export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.39.156:2376"
export DOCKER_CERT_PATH="/mnt/nvme0n1p1/mint-auto/.minikube/certs"
export DOCKER_API_VERSION="1.35"
# Run this command to configure your shell:
# eval $(minikube docker-env)'
++ export DOCKER_TLS_VERIFY=1
++ DOCKER_TLS_VERIFY=1
++ export DOCKER_HOST=tcp://192.168.39.156:2376
++ DOCKER_HOST=tcp://192.168.39.156:2376
++ export DOCKER_CERT_PATH=/mnt/nvme0n1p1/mint-auto/.minikube/certs
++ DOCKER_CERT_PATH=/mnt/nvme0n1p1/mint-auto/.minikube/certs
++ export DOCKER_API_VERSION=1.35
++ DOCKER_API_VERSION=1.35
+ cp -af Dockerfile.mint-auto 6592-71a59c6/gopath/src/github.com/minio/minio/
+ docker build --quiet --tag minio/minio-6592-71a59c6 --file 6592-71a59c6/gopath/src/github.com/minio/minio/Dockerfile.mint-auto 6592-71a59c6/gopath/src/github.com/minio/minio
Sending build context to Docker daemon  113.6MB

Step 1/10 : FROM alpine:3.7
 ---> 34ea7509dcad
Step 2/10 : MAINTAINER Minio Inc <dev@minio.io>
 ---> Using cache
 ---> 00ec607a8456
Step 3/10 : COPY minio dockerscripts/docker-entrypoint.sh dockerscripts/healthcheck.sh /usr/bin/
 ---> e077ada902df
Step 4/10 : ENV MINIO_UPDATE off
 ---> Running in f5a744b5e61f
Removing intermediate container f5a744b5e61f
 ---> e0b1621543a7
Step 5/10 : RUN      apk add --no-cache ca-certificates curl su-exec>=0.2 &&      echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' >> /etc/nsswitch.conf &&      chmod +x /usr/bin/minio &&      chmod +x /usr/bin/docker-entrypoint.sh &&      chmod +x /usr/bin/healthcheck.sh
 ---> Running in 1ef286d335f9
�[91mWARNING: Ignoring http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz: temporary error (try again later)
�[0m�[91mERROR: unsatisfiable constraints:
�[0mThe command '/bin/sh -c apk add --no-cache ca-certificates curl su-exec>=0.2 &&      echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' >> /etc/nsswitch.conf &&      chmod +x /usr/bin/minio &&      chmod +x /usr/bin/docker-entrypoint.sh &&      chmod +x /usr/bin/healthcheck.sh' returned a non-zero code: 3

@vadmeste
Copy link
Member Author

This is not WIP anymore, but the code doesn't create empty files. As I said, Delete API already doesn't count nonexistent object in some disks as an error since this is already a confirmation of the Delete action. So Delete actually can remove objects that don't have quorum.

This PR modifies getObjectInfo(), but we should notice that this is a dangerous change because if there is any mistake anywhere, mc ls -r can rename all objects to .CORRUPTED and this also can happen for any regression in posix, storage rest server/client code. I am really not sure if it is worthy to do the rename in get object info in the first place.

Copy link
Member

@harshavardhana harshavardhana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to be extra careful yes, but it is okay to make the object a user interactive responsibility.

We are not actively deleting anything so it is okay.

}
}
// If xl.json is not present in read quorum disks,
// add .CORRUPTION prefix to the current object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.CORRUPTED typo here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.CORRUPTED typo here.

Fixed

// If xl.json is not present in read quorum disks,
// add .CORRUPTION prefix to the current object.
if len(xl.getDisks())-notFoundXLJSON < readQuorum {
rename(ctx, xl.getDisks(), bucket, object, bucket, object+xlCorruptedSuffix, true, len(xl.getDisks())/2+1, []error{errFileNotFound})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the quorum similar to global standard class ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the quorum similar to global standard class ?

Fixed

@vadmeste vadmeste changed the title [WIP] xl: Add '.CORRUPTED' prefix to corrupted objects xl: Add '.CORRUPTED' prefix to corrupted objects Oct 14, 2018
@vadmeste vadmeste force-pushed the issue/6575 branch 2 times, most recently from acb20f9 to 759cae0 Compare October 14, 2018 20:17
@vadmeste
Copy link
Member Author

ping @harshavardhana @kannappanr

harshavardhana
harshavardhana previously approved these changes Oct 16, 2018
@minio minio deleted a comment from minio-ops Oct 16, 2018
getObjectInfo() checks if a given object has enough xl.json in disks,
returns 404 if not and added .CORRUPTED suffix to object.
@minio-ops
Copy link

Mint Automation

Test Result
mint-tls.sh ✔️
mint-xl.sh ✔️
mint-gateway-nas.sh ✔️
mint-large-bucket.sh ✔️
mint-worm.sh ✔️
mint-fs.sh ✔️
mint-dist-xl.sh ✔️

Copy link
Contributor

@kannappanr kannappanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kannappanr
Copy link
Contributor

@kannappanr kannappanr merged commit 362ebdc into minio:master Oct 16, 2018
harshavardhana added a commit to harshavardhana/minio that referenced this pull request Oct 18, 2018
This PR completes the missing functionality from minio#6592
harshavardhana added a commit to harshavardhana/minio that referenced this pull request Oct 19, 2018
This PR completes the missing functionality from minio#6592
kannappanr pushed a commit that referenced this pull request Oct 19, 2018
This PR completes the missing functionality from #6592
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to replace a file not having read quorum with non standard SC

5 participants