The Gravity of History: Mastering Repository, Issue, and PR Migration

In the high-stakes world of enterprise software, moving a codebase is rarely just about git push --mirror. While moving the source code is a trivial technical exercise, migrating the context—the years of architectural debates in Pull Requests, the historical bug tracking in Issues, and the institutional knowledge embedded in project boards—is where senior engineers prove their worth.

A “lift and shift” approach often fails because it ignores the metadata. When you migrate from GitLab, Bitbucket, or even between GitHub Organizations, you are moving a living ecosystem. If you lose the links between a PR and the issue it solved, or if timestamps are reset to the day of the migration, you effectively lobotomize your project’s history. This makes debugging “why was this line added three years ago?” nearly impossible.

The Expert’s Strategy

Modern migrations must prioritize Identity Mapping and Metadata Integrity. Real-world migration workflows now leverage the GitHub Enterprise Importer (GEI) or the GraphQL API to ensure that “User A” on the old system is correctly mapped to “User A” on GitHub. Without this, your history becomes a ghost town of “ghost users” or “system-migrator” attributions.

Common Pitfall: Ignoring CI/CD secrets and Webhooks. Engineers often move the repo but forget that the deployment pipeline is hardcoded to the old repository URL or relies on secrets stored in the old platform’s environment variables. A successful migration is only complete when the first “Green Build” lands on the target destination.


Study Guide: GitHub Migration Ecosystem

Repository, Issue, and PR migration is the process of transferring a software project’s entire lifecycle data from a source (e.g., On-premise Server, Bitbucket, GitLab) to a destination (GitHub.com or GitHub Enterprise), preserving as much relational data as possible.

The “Digital Library” Analogy

Imagine moving a massive city library to a new building. You don’t just move the books (the Code). You also need to move the checkout logs (Commit History), the suggestion box notes (Issues), the librarian’s peer-review notes on new acquisitions (Pull Requests), and the library cards (User Identity). If you just move the books, you have the knowledge, but you’ve lost the community’s relationship with it.

Core Concepts & Terminology

  • Source vs. Target: The origin platform and the destination GitHub organization.
  • Metadata: Non-code data including labels, milestones, assignees, and timestamps.
  • Identity Mapping: A CSV or JSON file that maps old_username to github_username.
  • Re-writing History: Using tools like git-filter-repo to remove large binaries or sensitive data before migrating.

Typical Migration Workflow

  1. Audit: Identify which repos, issues, and PRs are active. Archive stale ones.
  2. Preparation: Create the target Organization and set up Team structures.
  3. Dry Run: Use the gh-gei extension to run a test migration of a medium-sized repo.
  4. Freeze Period: Set the source repository to “Read-Only” to prevent new commits during the move.
  5. Execution: Run migration scripts for code, then issues, then PRs.
  6. Validation: Check PR comment counts and commit hashes.

Real-World Scenarios

Scenario 1: The “Small Team” Shift

Context: A startup moving from a private Bitbucket instance to GitHub to leverage GitHub Actions.

Application: Use the built-in GitHub Importer tool in the UI. It handles the git clone and git push automatically and attempts to import issues.

Outcome: Quick and easy, but limited. Fine for 5-10 developers, but doesn’t handle complex PR review histories well.

Scenario 2: The “Enterprise Consolidation”

Context: A Fortune 500 company merging five GitHub Organizations into one central Enterprise account.

Application: Using the GitHub Enterprise Importer (GEI) via the CLI. Scripts are used to map thousands of users across LDAP and SAML identities.

Outcome: High fidelity. Preserves PR reviews, comments, and project boards. Risk: High complexity in managing conflicting repository names.

Interview Questions

  1. How do you migrate a repository while preserving commit hashes?

    Use git clone --mirror from the source and git push --mirror to the target. This ensures all refs, tags, and branches are identical, maintaining the integrity of the commit hashes.

  2. What is the biggest challenge when migrating Issues from GitLab to GitHub?

    Identity mapping and internal references. If an issue mentions #123, that ID might change in the new system. Also, user mentions (@user) must be updated to match GitHub handles.

  3. When should you use git-filter-repo during a migration?

    When you need to “clean” the history—specifically removing accidentally committed secrets, large .zip files, or sensitive data that shouldn’t exist in the new target repo.

  4. How do you handle “Downtime” during a large-scale migration?

    By implementing a “Code Freeze.” You set the source repo to Read-Only, perform the final sync, update CI/CD pointers, and then “Unlock” the target repo for the team.

  5. What happens to PRs that are “Open” during a migration?

    Ideally, they should be merged or closed before migration. If moved via GEI, they remain open on the target, but the underlying “head” and “base” branches must also be migrated for the PR to remain valid.

  6. Why is the GitHub Enterprise Importer (GEI) preferred over manual scripts?

    GEI is a Microsoft-supported tool that handles high-fidelity data (like review comments and timestamps) that are difficult to capture via standard git commands or basic API calls.

  7. How do you migrate GitHub Actions secrets?

    Secrets are not migrated by default for security reasons. They must be re-added to the target repository or organization using the CLI, API, or manually.

  8. What is a “Sidecar” migration?

    Migrating non-git data like Wiki pages, Releases, and Project Boards alongside the main repository code.

  9. How do you handle LFS (Large File Storage) objects during migration?

    You must ensure LFS is initialized on the target. Use git lfs fetch --all and git lfs push --all to ensure the actual large blobs are moved, not just the pointers.

  10. What is the risk of “Squash and Merge” history during migration?

    If the source used “Squash and Merge” exclusively, the migration is simpler as there are fewer commits. However, if you are trying to reconstruct a branch-based history into a mono-repo, you may lose the granular context of how features were built.

Interview Tips & Golden Nuggets

  • The “Immutable History” Trap: If an interviewer asks if you can change a commit message during migration, mention git filter-branch or git-filter-repo, but warn that this changes commit hashes and breaks existing forks.
  • Rebase vs. Merge in Migration: Senior engineers know that migrating a repo with a “Merge” history is easier than “Rebase” history because merge commits provide clear “points in time” for the migration audit.
  • API Rate Limits: When discussing large migrations (1000+ issues), always mention Rate Limiting. Explain how you’d implement exponential backoff in your migration script.
  • Audit Logs: Mention that after migration, checking the GitHub Audit Log is the best way to verify that permissions were set correctly on the new repository.

Comparison: Migration Methods

Method Best For Strengths Limitations
GitHub Importer (UI) Small, simple repos Zero setup, easy for beginners No fine-grained control; fails on large repos
Mirror Push (CLI) Code-only migrations 100% hash accuracy Zero Issue/PR/Metadata migration
GEI (CLI Extension) Enterprise-grade moves High fidelity (PRs, Issues, Reviews) Requires CLI knowledge; specific source support
Custom API Scripts Edge cases / bespoke data Infinite flexibility High development effort; rate limit issues

Migration Architecture & Workflow

SOURCE (Bitbucket/GitLab) MIGRATION ENGINE (GEI / API) 1. Identity Mapping 2. Blob/LFS Transfer 3. Metadata Reconstitution TARGET (GitHub Org)

Repo Ecosystem

  • Mirror branches & tags.
  • Migrate Git LFS pointers.
  • Update README links.

Collaboration

  • Map PR reviewers.
  • Preserve “Threaded” comments.
  • Carry over Labels/Milestones.

Automation

  • Re-link GitHub Actions.
  • Update Webhook endpoints.
  • Rotate Repository Secrets.

Decision Guidance: Which Tool?

  • Is it just code?git push --mirror
  • Moving from GitLab/Bitbucket to GitHub?GitHub Enterprise Importer (GEI)
  • Need to change history (remove secrets)?git-filter-repo
  • Moving between two GitHub Orgs?gh gei migrate-repo
Production Use Case:

A global bank migrated 4,000 repositories from an on-premise Bitbucket server to GitHub Enterprise Cloud. By using the GEI and a custom Python wrapper for identity mapping, they maintained 100% of their PR audit trails—a regulatory requirement—while reducing their migration window from 48 hours to just 4 hours per business unit.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top