Feat/foundation#1
Conversation
- Add main analyzer script for GitHub repository analysis - Include exported data and repository list
Summary of ChangesHello @amargiovanni, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a foundational framework for a structured, spec-driven development workflow. It introduces a comprehensive set of new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive framework for spec-driven development, including command definitions, helper scripts, and templates. It also adds a standalone GitHub repository analyzer script. The changes are foundational and set up a structured workflow. My review focuses on improving the robustness and maintainability of the new scripts and ensuring the correctness of the defined processes. I've identified several critical and high-severity issues in the Python analyzer and bash scripts that should be addressed to ensure the tools work correctly and performantly.
| - Create parallel execution examples per user story | ||
| - Validate task completeness (each user story has all needed tasks, independently testable) | ||
|
|
||
| 4. **Generate tasks.md**: Use `.specify.specify/templates/tasks-template.md` as structure, fill with: |
There was a problem hiding this comment.
There is a typo in the path to the tasks template. The path .specify.specify/ has specify repeated. This will cause the command to fail when it tries to find the template file.
| 4. **Generate tasks.md**: Use `.specify.specify/templates/tasks-template.md` as structure, fill with: | |
| 4. **Generate tasks.md**: Use `.specify/templates/tasks-template.md` as structure, fill with: |
| update_existing_agent_file() { | ||
| local target_file="$1" | ||
| local current_date="$2" | ||
|
|
||
| log_info "Updating existing agent context file..." | ||
|
|
||
| # Use a single temporary file for atomic update | ||
| local temp_file | ||
| temp_file=$(mktemp) || { | ||
| log_error "Failed to create temporary file" | ||
| return 1 | ||
| } | ||
|
|
||
| # Process the file in one pass | ||
| local tech_stack=$(format_technology_stack "$NEW_LANG" "$NEW_FRAMEWORK") | ||
| local new_tech_entries=() | ||
| local new_change_entry="" | ||
|
|
||
| # Prepare new technology entries | ||
| if [[ -n "$tech_stack" ]] && ! grep -q "$tech_stack" "$target_file"; then | ||
| new_tech_entries+=("- $tech_stack ($CURRENT_BRANCH)") | ||
| fi | ||
|
|
||
| if [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]] && [[ "$NEW_DB" != "NEEDS CLARIFICATION" ]] && ! grep -q "$NEW_DB" "$target_file"; then | ||
| new_tech_entries+=("- $NEW_DB ($CURRENT_BRANCH)") | ||
| fi | ||
|
|
||
| # Prepare new change entry | ||
| if [[ -n "$tech_stack" ]]; then | ||
| new_change_entry="- $CURRENT_BRANCH: Added $tech_stack" | ||
| elif [[ -n "$NEW_DB" ]] && [[ "$NEW_DB" != "N/A" ]] && [[ "$NEW_DB" != "NEEDS CLARIFICATION" ]]; then | ||
| new_change_entry="- $CURRENT_BRANCH: Added $NEW_DB" | ||
| fi | ||
|
|
||
| # Check if sections exist in the file | ||
| local has_active_technologies=0 | ||
| local has_recent_changes=0 | ||
|
|
||
| if grep -q "^## Active Technologies" "$target_file" 2>/dev/null; then | ||
| has_active_technologies=1 | ||
| fi | ||
|
|
||
| if grep -q "^## Recent Changes" "$target_file" 2>/dev/null; then | ||
| has_recent_changes=1 | ||
| fi | ||
|
|
||
| # Process file line by line | ||
| local in_tech_section=false | ||
| local in_changes_section=false | ||
| local tech_entries_added=false | ||
| local changes_entries_added=false | ||
| local existing_changes_count=0 | ||
| local file_ended=false | ||
|
|
||
| while IFS= read -r line || [[ -n "$line" ]]; do | ||
| # Handle Active Technologies section | ||
| if [[ "$line" == "## Active Technologies" ]]; then | ||
| echo "$line" >> "$temp_file" | ||
| in_tech_section=true | ||
| continue | ||
| elif [[ $in_tech_section == true ]] && [[ "$line" =~ ^##[[:space:]] ]]; then | ||
| # Add new tech entries before closing the section | ||
| if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then | ||
| printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file" | ||
| tech_entries_added=true | ||
| fi | ||
| echo "$line" >> "$temp_file" | ||
| in_tech_section=false | ||
| continue | ||
| elif [[ $in_tech_section == true ]] && [[ -z "$line" ]]; then | ||
| # Add new tech entries before empty line in tech section | ||
| if [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then | ||
| printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file" | ||
| tech_entries_added=true | ||
| fi | ||
| echo "$line" >> "$temp_file" | ||
| continue | ||
| fi | ||
|
|
||
| # Handle Recent Changes section | ||
| if [[ "$line" == "## Recent Changes" ]]; then | ||
| echo "$line" >> "$temp_file" | ||
| # Add new change entry right after the heading | ||
| if [[ -n "$new_change_entry" ]]; then | ||
| echo "$new_change_entry" >> "$temp_file" | ||
| fi | ||
| in_changes_section=true | ||
| changes_entries_added=true | ||
| continue | ||
| elif [[ $in_changes_section == true ]] && [[ "$line" =~ ^##[[:space:]] ]]; then | ||
| echo "$line" >> "$temp_file" | ||
| in_changes_section=false | ||
| continue | ||
| elif [[ $in_changes_section == true ]] && [[ "$line" == "- "* ]]; then | ||
| # Keep only first 2 existing changes | ||
| if [[ $existing_changes_count -lt 2 ]]; then | ||
| echo "$line" >> "$temp_file" | ||
| ((existing_changes_count++)) | ||
| fi | ||
| continue | ||
| fi | ||
|
|
||
| # Update timestamp | ||
| if [[ "$line" =~ \*\*Last\ updated\*\*:.*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] ]]; then | ||
| echo "$line" | sed "s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/$current_date/" >> "$temp_file" | ||
| else | ||
| echo "$line" >> "$temp_file" | ||
| fi | ||
| done < "$target_file" | ||
|
|
||
| # Post-loop check: if we're still in the Active Technologies section and haven't added new entries | ||
| if [[ $in_tech_section == true ]] && [[ $tech_entries_added == false ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then | ||
| printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file" | ||
| tech_entries_added=true | ||
| fi | ||
|
|
||
| # If sections don't exist, add them at the end of the file | ||
| if [[ $has_active_technologies -eq 0 ]] && [[ ${#new_tech_entries[@]} -gt 0 ]]; then | ||
| echo "" >> "$temp_file" | ||
| echo "## Active Technologies" >> "$temp_file" | ||
| printf '%s\n' "${new_tech_entries[@]}" >> "$temp_file" | ||
| tech_entries_added=true | ||
| fi | ||
|
|
||
| if [[ $has_recent_changes -eq 0 ]] && [[ -n "$new_change_entry" ]]; then | ||
| echo "" >> "$temp_file" | ||
| echo "## Recent Changes" >> "$temp_file" | ||
| echo "$new_change_entry" >> "$temp_file" | ||
| changes_entries_added=true | ||
| fi | ||
|
|
||
| # Move temp file to target atomically | ||
| if ! mv "$temp_file" "$target_file"; then | ||
| log_error "Failed to update target file" | ||
| rm -f "$temp_file" | ||
| return 1 | ||
| fi | ||
|
|
||
| return 0 | ||
| } |
There was a problem hiding this comment.
The function update_existing_agent_file is very complex for a shell script. It relies on line-by-line processing and parsing section headers, which is fragile and hard to maintain. A small change in the template's formatting could break this script.
A more robust and maintainable approach would be to use markers in the template file to denote sections that should be automatically updated. For example:
## Active Technologies
<!-- ACTIVE_TECHNOLOGIES_START -->
<!-- ACTIVE_TECHNOLOGIES_END -->
## Recent Changes
<!-- RECENT_CHANGES_START -->
<!-- RECENT_CHANGES_END -->The script could then replace the content between these markers, which is much safer than parsing the file structure. This pattern is already used for MANUAL ADDITIONS in the template. I strongly recommend refactoring this function to use a marker-based approach for better long-term maintainability.
| sha = commit.get("sha", "") | ||
| self._log(f" Commit {idx}/{total}: {sha[:7]} - Recupero dettagli...", "debug") | ||
| detail_url = f"{self.base_url}/repos/{owner}/{repo}/commits/{sha}" | ||
| detail = self._make_request(detail_url) |
There was a problem hiding this comment.
The fetch_commits function fetches a list of commits and then makes a separate API call for each commit to get its details. This is an N+1 query problem and will be very slow, likely hitting API rate limits on repositories with many commits.
Consider using the GitHub GraphQL API to fetch all required data in a single request. This would dramatically improve performance and reduce the number of API calls.
Alternatively, assess if all the information from the detail endpoint is strictly necessary. If not, you might be able to avoid the extra calls.
This is a significant architectural change, so no direct code suggestion is provided, but it's a critical performance issue to address.
| created_at = pr.get("created_at", "") | ||
| if created_at: | ||
| created_date = datetime.fromisoformat(created_at.replace("Z", "+00:00")) | ||
| if created_date.replace(tzinfo=None) < self.since_date: | ||
| continue |
There was a problem hiding this comment.
The current implementation for fetching pull requests might miss some data. It sorts by updated and then filters by created_at on the client side. This means a PR created long ago but not updated within the analysis window will be missed.
A more reliable approach is to use the GitHub Search API, which allows filtering by creation date directly on the server. This is more accurate and efficient.
You could construct a search query like: q=repo:{owner}/{repo} is:pr created:>={self.since_date.isoformat()} and use the /search/issues endpoint.
This is a significant logic change, so I'm not providing a direct code suggestion, but this change is critical for data accuracy.
|
|
||
| d. Run the script `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS"` with the calculated number and short-name: | ||
| - Pass `--number N+1` and `--short-name "your-short-name"` along with the feature description | ||
| - Bash example: `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS" --json --number 5 --short-name "user-auth" "Add user authentication"` |
There was a problem hiding this comment.
The example command for create-new-feature.sh has the --json flag repeated. This could cause confusion or errors if a user copies and pastes the command.
| - Bash example: `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS" --json --number 5 --short-name "user-auth" "Add user authentication"` | |
| - Bash example: `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS" --number 5 --short-name "user-auth" "Add user authentication"` |
| 1. From the executed script, extract the path to **tasks**. | ||
| 1. Get the Git remote by running: | ||
|
|
||
| ```bash | ||
| git config --get remote.origin.url | ||
| ``` | ||
|
|
||
| **ONLY PROCEED TO NEXT STEPS IF THE REMOTE IS A GITHUB URL** | ||
|
|
||
| 1. For each task in the list, use the GitHub MCP server to create a new issue in the repository that is representative of the Git remote. |
There was a problem hiding this comment.
The numbered list in the outline is incorrectly formatted, with multiple items numbered as 1.. This should be corrected to be a sequential list for clarity.
| 1. From the executed script, extract the path to **tasks**. | |
| 1. Get the Git remote by running: | |
| ```bash | |
| git config --get remote.origin.url | |
| ``` | |
| **ONLY PROCEED TO NEXT STEPS IF THE REMOTE IS A GITHUB URL** | |
| 1. For each task in the list, use the GitHub MCP server to create a new issue in the repository that is representative of the Git remote. | |
| 2. From the executed script, extract the path to **tasks**. | |
| 3. Get the Git remote by running: | |
| ```bash | |
| git config --get remote.origin.url |
ONLY PROCEED TO NEXT STEPS IF THE REMOTE IS A GITHUB URL
- For each task in the list, use the GitHub MCP server to create a new issue in the repository that is representative of the Git remote.
| except Exception as e: | ||
| self._log(f"Errore richiesta: {e}", "error", force=True) | ||
| return None |
There was a problem hiding this comment.
This except Exception as e: block is too broad. It catches any exception, which can hide bugs and make debugging difficult. It's better to catch specific exceptions that you expect to occur, such as requests.exceptions.RequestException or urllib.error.URLError.
I've noticed this pattern of using broad or bare except clauses in other places in the file as well (e.g., lines 521, 536, 581, 864, 935). Please apply a similar fix to all occurrences by catching more specific exceptions.
| except Exception as e: | |
| self._log(f"Errore richiesta: {e}", "error", force=True) | |
| return None | |
| except (requests.exceptions.RequestException, urllib.error.URLError) as e: | |
| self._log(f"Errore richiesta: {e}", "error", force=True) | |
| return None |
No description provided.