What Project Rescue Actually Looks Like: Rebuilding a Telco Platform in Kuala Lumpur

The Situation

I was brought in as lead architect on a project that had already been running for months and was visibly struggling. The client was one of Malaysia’s largest telecommunications operators. The objective was a subscriber online platform — a system that would integrate post-paid, pre-paid, and business subscriber management into a coherent whole, providing a foundation for the services the telco wanted to offer its millions of customers.

The project had the right ambition and the wrong execution. Multiple workstreams had been running in parallel — subscriber management, provisioning, billing integration, customer-facing interfaces — but not in sync. Each stream had made progress against its own internal milestones. What they had not done was progress against each other. The result was a collection of partial solutions that didn’t connect, built against assumptions that weren’t shared, in a codebase that was growing more complex and less coherent with each sprint.

This is a specific kind of project failure, distinct from the more visible kind where nothing gets built. Something gets built. The problem is that what gets built doesn’t fit together, and the gap between where the streams are and where the integrated system needs to be is not obvious until someone maps it honestly.

The First Job: Establish a Real Baseline

The instinct in a project rescue is to start fixing things. This is almost always wrong.

Before anything can be fixed, you need to know precisely what state the project is actually in — not what the status reports say, not what the stream leads believe, but what is demonstrably true. This means working through documentation, code, and integration tests with enough rigour to distinguish between “done,” “done but wrong,” “in progress,” and “not started despite being marked complete.” In a project with multiple streams and months of accumulated drift, these categories are rarely what the project dashboard suggests.

The baselining process in Kuala Lumpur took the better part of four weeks and was not comfortable. Some work that had been reported as complete needed to be reclassified. Some integration assumptions that multiple streams had built against turned out to be wrong. And in a few cases, work that appeared to be in progress was actually blocked, waiting on decisions that hadn’t been escalated because nobody wanted to be the one to surface them.

The output of an honest baseline is not a recovery plan. It is a true starting point. You cannot plan a route from a position you haven’t accurately identified. The two weeks felt like lost time. They weren’t. Every decision made after them was made from a real picture of where the project stood, and that clarity was worth more than four weeks of continued forward motion in the wrong direction.

The Provisioning Problem

The most acute technical problem we inherited was in the provisioning system — the component responsible for activating and configuring services on the network when a subscriber signed up, changed their plan, or reported a fault.

The original solution design had specified a particular provisioning platform for this component. That platform had not been sold to the client. A different tool had been committed to instead — a business process management platform that was capable and well-suited to workflow orchestration but had not been designed with telecoms provisioning as a primary use case.

This is the kind of situation that project retrospectives describe as “the sales process committed to the wrong technology.” It is worth being precise about what that means in delivery terms: it means you cannot go back. Renegotiating the technology commitment at the point we entered would have required acknowledging a mis-sell to a client who was already watching the project with concern, extended the timeline significantly, and potentially unravelled the commercial relationship. None of those outcomes were acceptable. The constraint was real.

So we treated it as an engineering problem rather than an excuse. The BPM platform had genuine strengths: it was excellent at modelling multi-step workflows with conditional branching, human task management, and audit trails. Subscriber provisioning, stripped back to its structure, is a complex workflow — validate the request, check eligibility, configure the network resource, update the subscriber record, trigger billing setup, confirm completion. The platform could model that workflow, manage its state, handle failures, and retry. What it couldn’t do natively, we built: the integrations to the underlying network systems, the telco-specific data transformations, the performance optimisations required to meet activation time targets.

The result was a provisioning system that worked, met the client’s requirements, and was maintainable by the client’s team after we left. It was not the system we would have designed from scratch. It was the best system achievable within the constraint we had inherited. That is a different and more honest objective for a rescue project.

Integrating OSS and BSS

The deeper architectural challenge was the integration between the client’s Operational Support Systems and Business Support Systems — the two broad families of technology that telecoms operators rely on, which have historically been developed, procured, and managed in separate organisational silos.

OSS covers the network-facing functions: provisioning, fault management, performance monitoring, network inventory. BSS covers the customer-facing and commercial functions: billing, CRM, order management, product catalog. In an integrated subscriber platform, these two worlds have to talk to each other continuously. A customer order captured in the BSS triggers provisioning in the OSS. A network fault detected in the OSS needs to surface in the BSS to trigger service credits or support workflows. The data models are different, the systems are typically from different vendors, and the teams that own them often have limited experience working across the boundary.

We approached the integration methodically: mapping each touchpoint between OSS and BSS functions, identifying where the data models diverged and required transformation, sequencing the integration work to deliver the most critical paths first, and building a set of integration contracts that both sides could develop and test against independently. This meant more upfront design time and more structured testing than the project had previously employed. It also meant fewer surprises in the later stages, when integration failures would have been significantly more expensive to resolve.

The phased approach mattered as much as the technical approach. A subscriber platform that partially works — that handles post-paid subscribers reliably even if pre-paid is still being integrated — is more valuable to a telco than a platform that handles nothing while everything is built simultaneously. Delivering usable capability incrementally also rebuilt internal confidence in the project at a point when confidence was low, which had its own compounding effect on team performance.

Knowledge Transfer as a Delivery Decision

One of the mistakes that rescue projects make is treating knowledge transfer as a close-out activity — something that happens in the final weeks before the consultants leave. By that point it is too late. The client’s team inherits systems they don’t understand, which means the first production issue becomes a crisis rather than a managed event.

We approached it differently. Several members of the client’s technical team were embedded in the delivery workstreams from early in the engagement, not as observers but as working members of those streams. They attended architecture decisions, contributed to design reviews, and owned specific components with appropriate support. The goal was not to produce documentation they could read after we left. The goal was to produce engineers who had built the system and understood it from the inside.

This approach has costs. It takes more time to bring someone into a technical decision than to make the decision and document it. Engineers who are also teaching work more slowly than engineers who are only building. On a rescue project with schedule pressure, these costs are visible and immediate. The benefit — a client team capable of operating, diagnosing, and evolving the system without external support — is realised later, and is invisible in sprint velocity metrics.

The argument for it is not altruistic. It is practical. A client whose team understands the system they’re running is a client who calls you with interesting problems, not emergencies. The relationship that follows a well-executed knowledge transfer is more durable than the relationship that follows a delivery the client can’t maintain.

What Rescue Projects Teach

Greenfield projects let you establish constraints. Rescue projects force you to inherit them and work within them honestly. The skills they develop are different and, in some ways, more useful.

An honest baseline is the most valuable deliverable of the first phase. It is also the hardest to produce, because it requires surfacing uncomfortable truths about work that has already been done and reported. Do it anyway. Every decision that follows is only as good as the picture it is made against.

Constraints inherited from poor prior decisions are engineering problems, not excuses. The provisioning system we built on a suboptimal platform worked. It worked because we treated the platform’s limitations as design inputs rather than justifications for failure. The constraint shaped the solution; it did not prevent one.

OSS/BSS integration is where telecoms platforms succeed or fail. The boundary between network-facing and customer-facing systems is where complexity accumulates and where integration assumptions break under real load. It deserves more design rigour than it typically receives, and a phased delivery approach that validates the critical paths early.

Knowledge transfer embedded in delivery is a different thing from knowledge transfer at the end of delivery. One produces engineers who built the system. The other produces engineers who read about it. The first is worth the cost.

The subscriber platform launched. The client’s team ran it. That is the only measure of a rescue project that matters — not whether it delivered on time, not whether the architecture was elegant, but whether what was built could stand on its own after the people who built it left.

What Project Rescue Actually Looks Like: Rebuilding a Telco Platform in Kuala Lumpur

The Situation

The First Job: Establish a Real Baseline

The Provisioning Problem

Integrating OSS and BSS

Knowledge Transfer as a Delivery Decision

What Rescue Projects Teach

Related articles

The Cloud Business Case That Didn't Survive the Data

The Decision Your Infrastructure Team Is Getting Wrong: Mainframe Offload in Mission-Critical Environments

Reference Technology Solutions: Bringing Architectural Coherence to Victorian Government ICT

Architecture at National Scale: Advising the Architecture Review Board for an MHRD Solution at NIC