Singapore is a latency test

If your product feels instant in San Francisco, Singapore will teach you what latency really costs. It is one of the most connected, digitally sophisticated markets in the world, and it sits far enough from every major cloud region to expose every assumption baked into your infrastructure. This is not a post about patriotism. It is about what happens when your architecture meets physics.

"Global" is not one thing

Most products are built with US-East or US-West as the default region. That works fine when your users are in North America or Western Europe, where round-trip times to the nearest data center hover around 20 to 50 milliseconds. But from Singapore, the picture changes fast. A request from Singapore to US-East typically takes 220 to 260 milliseconds round trip. To US-West, it is around 170 to 200 milliseconds. Even reaching the nearest AWS region in Southeast Asia (ap-southeast-1, hosted in Singapore itself) still depends on whether the services your app calls are actually deployed there or just proxied back to a US origin. Europe is not much better. Singapore to Frankfurt clocks in around 160 to 200 milliseconds. Singapore to London can push past 190 milliseconds. These numbers do not account for DNS resolution, TLS handshakes, or application-level processing, all of which add their own overhead. The latency map from Southeast Asia is not a gentle slope. It is a cliff. And that cliff is where the phrase "works fine on my machine" goes to die.

The UX tax of 200 to 300 milliseconds

Jakob Nielsen's foundational research on response times established three thresholds that still hold up: 100 milliseconds feels instantaneous, 1 second keeps the user's flow of thought, and 10 seconds is the limit of attention. What falls between 100 milliseconds and 1 second is where most modern web applications live, and where latency does its quiet damage. A 200 to 300 millisecond delay on a single interaction might feel tolerable in isolation. But users do not perform single interactions. They click, type, navigate, submit, and wait, over and over. When every action in a workflow carries an extra 200 milliseconds of overhead, the cumulative effect is unmistakable. The product feels sluggish. Users lose confidence. Engagement drops. Research consistently shows that even small increases in latency lead to measurable declines in user satisfaction and conversion. Amazon famously found that every 100 milliseconds of added latency cost them 1% in sales. Google found that an extra 500 milliseconds in search page load time reduced traffic by 20%. These are not edge cases. They are the baseline reality of how humans perceive speed. For users in Singapore and across Southeast Asia, this tax is not theoretical. It is the default experience for any product that has not explicitly optimized for the region.

Why agentic systems make it worse

The rise of agentic AI, systems that chain multiple LLM calls, tool invocations, and retrieval steps together, has introduced a new class of latency problem. Each step in an agentic workflow adds its own delay: prompt processing (50 to 200 milliseconds), embedding and retrieval (50 to 500 milliseconds), tool execution (variable), and token generation (proportional to output length). A simple question-and-answer interaction might involve one LLM call. An agentic workflow might involve five or ten, each waiting on the previous step. If each step adds 300 milliseconds of network latency on top of processing time, a workflow that takes 3 seconds in Virginia takes 6 or 7 seconds from Singapore. That is the difference between "fast enough" and "I'll just do it manually." This compounding effect is not limited to AI products. Any architecture that relies on sequential API calls, microservice chains, or multi-hop data fetching will exhibit the same pattern. But agentic systems are particularly exposed because the number of hops is often unpredictable and difficult to optimize at the application layer. The latency is not in any single call. It is in the multiplication.

Infrastructure choices that actually matter

Solving latency from Southeast Asia is not about picking one magic cloud region. It is about a set of deliberate infrastructure decisions. Region placement is the most obvious lever. Deploying compute and data in ap-southeast-1 (Singapore) or nearby regions like ap-southeast-5 (Malaysia) eliminates the cross-ocean round trip for users in the region. But this only works if the entire request path is regional, not just the entry point. A load balancer in Singapore that proxies to a database in US-East has not solved anything. Edge caching and CDNs can dramatically reduce latency for static and semi-static content. Serving assets, API responses, and pre-rendered pages from edge nodes in Singapore means the data travels meters, not thousands of kilometers. Services like CloudFront, Cloudflare, and Fastly all have points of presence in Singapore. Queue-based architectures decouple user-facing interactions from heavy backend processing. Instead of making the user wait for a synchronous round trip, accept the request, return immediately, and process asynchronously. This is especially important for agentic or AI-powered features where backend processing is inherently slow. Connection pooling and keep-alive reduce the overhead of establishing new connections. A TLS handshake to a distant server can add 100 to 200 milliseconds on its own. Reusing connections eliminates that cost for subsequent requests. None of these are novel. But the discipline of applying them consistently across every service in the stack is what separates products that work globally from products that work in California.

Product choices that matter just as much

Infrastructure alone is not enough. The product layer has to meet users where the physics leaves off. Async UX patterns are the most impactful design choice for high-latency environments. Instead of showing a spinner while waiting for a server response, show an optimistic update. Let the user move forward while the system catches up. Email clients have done this for decades. Modern collaborative tools are starting to catch on. Progressive disclosure reduces the amount of data that needs to load upfront. Instead of fetching an entire dashboard on page load, load the critical content first and fill in details as the user navigates deeper. This turns one large, slow request into several smaller, faster ones. Retry and fallback logic becomes more important when network conditions are less predictable. Southeast Asian internet infrastructure is generally excellent in urban areas, but cross-border routing can be inconsistent. Graceful retries, idempotent requests, and meaningful error states turn flaky connections into manageable ones. Prefetching and speculative loading can hide latency entirely. If you can predict what the user is likely to click next, start loading it before they click. This is especially effective in workflows with predictable navigation patterns. These are product decisions, not infrastructure decisions. They require designers and engineers to internalize the constraint that not every user is 20 milliseconds from the server.

The Singapore angle

Singapore is a small market, roughly 6 million people. But it punches far above its weight in digital adoption, mobile penetration, and user expectations. Singaporean users are accustomed to fast, well-designed products. They have high standards and low tolerance for friction. This makes Singapore an exceptional litmus test. If your product feels good in Singapore, it will feel good almost anywhere. If it feels sluggish in Singapore, you have a global problem that your US-based team simply has not noticed yet. There is also a hiring dimension to this. As more engineering teams become globally distributed, the tools those teams use need to work well across time zones and regions. A CI/CD pipeline that takes 30 seconds in Oregon but 90 seconds in Singapore is not just an annoyance. It is a productivity tax on every developer in the region. A real-time collaboration tool with noticeable lag across the Pacific is not just a UX issue. It is a coordination problem that compounds across every meeting, every code review, and every shared document. Distributed teams are not just an HR challenge. They are a product problem.

Latency is a design constraint, not a bug

The temptation is to treat latency as a backend metric, something to optimize after the product works. But for users outside the US-West bubble, latency is not an afterthought. It is the first thing they feel. Building for Singapore, and by extension for Southeast Asia, South Asia, and other regions far from major cloud hubs, means treating latency as a first-class design constraint. It means choosing regions deliberately, caching aggressively, designing for async workflows, and testing from the places your users actually are. The good news is that the tools exist. The cloud providers have regions in Singapore. The CDNs have edge nodes. The design patterns for async and optimistic UX are well documented. The hard part is caring enough to use them. Singapore is not a niche market. It is a latency test for whether your product is truly global.

References

Nielsen, J. (1993). "Response Times: The 3 Important Limits." Nielsen Norman Group. https://www.nngroup.com/articles/response-times-3-important-limits/

AWS Region Latency Matrix. Cloudping. https://www.cloudping.co/

"The Latent Threat of Latency: Why 300ms Matters More Than You Think." Duckweave, Medium (2025). https://medium.com/@duckweave/the-latent-threat-of-latency-why-300ms-matters-more-than-you-think-7e6ad8ee4802

"Understanding Latency in Multi-Agent GenAI Systems." Rajesh Srivastava, Medium. https://medium.com/@raj-srivastava/understanding-latency-in-multi-agent-genai-systems-1000dd34f6c4

"9 Tips for Reducing API Latency in Agentic AI Systems." Nordic APIs. https://nordicapis.com/9-tips-for-reducing-api-latency-in-agentic-ai-systems/

"A Practical Guide to Reducing Latency and Costs in Agentic AI Applications." Georgian. https://georgian.io/reduce-llm-costs-and-latency-guide

"API Latency: Why 200ms Feels Like Forever." APIVerve (2026). https://blog.apiverve.com/post/api-latency-why-200ms-feels-like-forever

AWS Global Infrastructure. Amazon Web Services. https://aws.amazon.com/about-aws/global-infrastructure/

Google Cloud Regions and Zones Documentation. https://docs.cloud.google.com/compute/docs/regions-zones