Skip to content

Conversation

@tassa-yoniso-manasi-karoto
Copy link
Contributor

Description

This PR fixes a critical bug that prevents the Docker image from building. The build fails when COPY . . runs without a WORKDIR, causing subsequent RUN commands to fail:

$ sudo docker build -t pythainlp-test .
Password: 
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon  148.5MB
Step 1/8 : FROM python:3.12
 ---> b3d245705d9c
Step 2/8 : COPY . .
 ---> aed91cc8d978
Step 3/8 : RUN apt-get update && apt-get install -y --no-install-recommends build-essential libicu-dev  python3-pip python3-venv pkg-config && rm -rf /var/lib/apt/lists/*
 ---> Running in 64978c3d47e5
failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "/bin/sh": stat /bin/sh: no such file or directory

Changes Made

  • Added WORKDIR /app directive before COPY . . in Dockerfile
  • Added docker-compose.yml for improved development workflow - this simplifies volume mounting, environment variables, and container management with just a single command like docker-compose run pythainlp instead of lengthy docker run commands

The Issue

The repository contains a bin directory. When COPY . . executes without a WORKDIR, it copies all repository contents to the container's root (/). This causes the system's /bin/sh to become inaccessible to subsequent RUN commands, though the exact mechanism is not clear (overwriting?).

Setting WORKDIR before COPY ensures repository contents are isolated in /app, preventing any interference with system directories.

Testing

  • Confirmed the original Dockerfile always fails at the RUN step after COPY
  • Successfully built the image with WORKDIR added
  • Tested PyThaiNLP functionality: docker run -it pythainlp python -c "from pythainlp.tokenize import word_tokenize; print(word_tokenize('สวัสดีครับ'))"
  • Tested docker-compose workflow: docker-compose run pythainlp python

Additional Notes

During testing, I discovered the CLI interface (thainlp command) produces no output in Docker containers, though the Python API works correctly. I've reported this separately in issue #1131

@sonarqubecloud
Copy link

sonarqubecloud bot commented Aug 1, 2025

@bact bact added the bug bugs in the library label Aug 1, 2025
@coveralls
Copy link

Coverage Status

coverage: 52.88%. remained the same
when pulling c712031 on tassa-yoniso-manasi-karoto:fix-dockerfile
into a069230 on PyThaiNLP:dev.

@bact bact merged commit 3158296 into PyThaiNLP:dev Aug 1, 2025
28 checks passed
@bact
Copy link
Member

bact commented Aug 1, 2025

Merged with 2 approvals.
Thank you @tassa-yoniso-manasi-karoto

@tassa-yoniso-manasi-karoto tassa-yoniso-manasi-karoto deleted the fix-dockerfile branch August 4, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug bugs in the library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants