Filtered by JavaScript, Python

Page 12

Reset

Find out all localStorage keys and their value sizes

July 13, 2019
0 comments Web development, JavaScript

I use localhost:3000 for a lot of different projects. It's the default port on create-react-app's dev server. The browser profile remains but projects come and go. There's a lot of old stuff in there that I have no longer any memory of adding.

My Storage tab in Firefox

Working in a recent single page app, I tried to use localStorage as a cache for some XHR requests and got: DOMException: "The quota has been exceeded.".
Wat?! I'm only trying to store a ~250KB JSON string. Surely that's far away from the mythical 5MB limit. Do I really have to lzw compress the string in and out to save room and pay for it in CPU cycles?

Better yet, find out what junk I still have in there.

Paste this into your Web Console (it's safe as milk):


Object.entries(localStorage).forEach(([k,v]) => console.log(k.padEnd(50), v.length, (v.length / 1024).toFixed(1) + 'KB'))

The output looks something like this:

Web Console output

Or, sorted and filtered a bit:


Object.entries(localStorage).sort((a, b) => b[1].length -a[1].length).slice(0,5).forEach(
([k,v]) => console.log(k.padEnd(50), v.length, (v.length / 1024).toFixed(1) + 'KB'));

Looks like this:

Sorted and sliced

And for the record, summed total in kilobytes:


(Object.values(localStorage).map(x => x.length).reduce((a, b) => a + b) / 1024).toFixed(1) + 'KB';

Summed in KB

Wrapping up

Seems my Firefox browser's localStorage limit is still 5MB.

Also, you can do the loop using localStorage.length and localStorage.key(n) and localStorage.getItem(localStorage.key(n)).length but using Object.entries(localStorage) seems neater.

I guess this means I can still use localStorage in my app. It seems I just need to localStorage.removeItem('massive-list:items') which sounds like an experiment, from eons ago, for seeing how much I can stuff in there.

SongSearch autocomplete rate now 2+ per second

July 11, 2019
0 comments Django, Python, Nginx, Redis

By analyzing my Nginx logs, I've concluded that SongSearch's autocomplete JSON API now gets about 2.2 requests per second. I.e. these are XHR requests to /api/search/autocomplete?q=....

Roughly, 1.8 requests per second goes back to the Django/Elasticsearch backend. That's a hit ratio of 16%. These Django/Elasticsearch requests take roughly 200ms on average. I suspect about 150-180ms of that time is spent querying Elasticsearch, the rest being Python request/response and JSON "paperwork".

Autocomplete counts in Datadog

Caching strategy

Caching is hard because the queries are so vastly different over time. Had I put a Redis cache decorator on the autocomplete Django view function I'd quickly bloat Redis memory and cause lots of evictions.

What I used to do was something like this:


def search_autocomplete(request):
   q = request.GET.get('q') 

   cache_key = None
   if len(q) < 10:
      cache_key = 'autocomplete:' + q
      results = cache.get(cache_key)
      if results is not None:
          return http.JsonResponse(results)

   results = _do_elastisearch_query(q)
   if cache_key:
       cache.set(cache_key, results, 60 * 60)

   return http.JsonResponse(results)   

However, after some simple benchmarking it was clear that using Nginx' uwsgi_cache it was much faster to let the cacheable queries terminate already at Nginx. So I changed the code to something like this:


def search_autocomplete(request):
   q = request.GET.get('q') 
   results = _do_elastisearch_query(q)
   response = http.JsonResponse(results)   

   if len(q) < 10:
       patch_cache_control(response, public=True, max_age=60 * 60)

   return response

The only annoying thing about Nginx caching is that purging is hard unless you go for that Nginx Plus (or whatever their enterprise version is called). But more annoying, to me, is that fact that I can't really see what this means for my server. When I was caching with Redis I could just use redis-cli and...

> INFO
...
# Memory
used_memory:123904288
used_memory_human:118.16M
...

Nginx Amplify

My current best tool for keeping an eye on Nginx is Nginx Amplify. It gives me some basic insights about the state of things. Here are some recent screenshots:

NGINX Requests/s

NGINX Memory Usage

NGINX CPU Usage %

Thoughts and conclusion

Caching is hard. But it's also fun because it ties directly into performance work.

In my business logic, I chose that autocomplete queries that are between 1 and 9 characters are cacheable. And I picked a TTL of 60 minutes. At this point, I'm not sure exactly why I chose that logic but I remember doing some back-of-envelope calculations about what the hit ratio would be and roughly what that would mean in bytes in RAM. I definitely remember picking 60 minutes because I was nervous about bloating Nginx's memory usage. But as of today, I'm switching that up to 24 hours and let's see what that does to my current 16% Nginx cache hit ratio. At the moment, /var/cache/nginx-cache/ is only 34MB which isn't much.

Another crux with using uwsgi_cache (or proxy_cache) is that you can't control the cache key very well. When it was all in Python I was able to decide about the cache key myself. A plausible implementation is cache_key = q.lower().strip() for example. That means you can protect your Elasticsearch backend from having to do {"q": "A"} and {"q": "a"}. Who knows, perhaps there is a way to hack this in Nginx without compiling in some Lua engine.

The ideal would be some user-friendly diagnostics tool that I can point somewhere, towards Nginx, that says how much my uwsgi_cache is hurting or saving me. Autocomplete is just one of many things going on on this single DigitalOcean server. There's also a big PostgreSQL server, a node-express cluster, a bunch of uwsgi workers, Redis, lots of cron job scripts, and of course a big honking Elasticsearch 6.

UPDATE (July 12 2019)

Currently, and as mentioned above, I only set Cache-Control headers (which means Nginx snaps it up) for queries that at max 9 characters long. I wanted to appreciate and understand how ratio of all queries are longer than 9 characters so I wrote a report and its output is this:

POINT: 7
Sum show 75646 32.2%
Sum rest 159321 67.8%

POINT: 8
Sum show 83702 35.6%
Sum rest 151265 64.4%

POINT: 9
Sum show 90870 38.7%
Sum rest 144097 61.3%

POINT: 10
Sum show 98384 41.9%
Sum rest 136583 58.1%

POINT: 11
Sum show 106093 45.2%
Sum rest 128874 54.8%

POINT: 12
Sum show 113905 48.5%
Sum rest 121062 51.5%

It means that (independent of time expiry) 38.7% of queries are 9 characters or less.

From jQuery to Cash

June 18, 2019
4 comments Web development, JavaScript

tl;dr; The main JavaScript bundle goes from 29KB to 6KB by switching from JQuery to Cash. Both with Brotli compression.

In Web Performance, every byte counts. Downloading less stuff means faster network operations but for JavaScript it also means less to parse and execute. This site used use JQuery 3.4.1 but now uses Cash 4.1.2. It requires some changes to how you use $ and most noticeable is the lack of animations and $.ajax.

I still stand by the $ function. It's great when you have a regular (static) website that isn't a single page app but still needs a little bit of interactive JavaScript functionality. On this site, I use it for making the commenting work and some various navigation/header stuff.

Switching to Cash means you have to stop doing things like $.getJSON() and $('.classname').fadeIn(400) which, in a sense, gives Cash an unfair advantage because those bits take up a large portion of the bundle size. Yes, there is a custom build of jQuery without those but check out this size comparison:

Bundle Uncompressed (bytes) Gzipped (bytes)
jQuery 3.4.1 88,145 30,739
jQuery 3.4.1 Slim 71,037 24,403
Cash 4.1.2 14,818 5,167

I still needed a fadeIn function, which I was relying on from jQuery, but to remedy that I just copied one of these from youmightnotneedjquery.com. It would be better to not do that an use a CSS transform instead but, well, I'm only human.

Before: with jQuery
Before: with jQuery

Another thing you'll need to replace is to switch from $.ajax to fetch but there are good polyfills but I haven't bothered with polyfills because the tiny percentage of visitors I have, without fetch support still get a working site but can't post comments.

I was contemplating doing what GitHub did in 2018 which was to replace jQuery with real vanilla JavaScript code but it didn't seem worth it now that Cash is only 5KB (gzipped) and it's an actively maintained project too.

Before: with jQuery
Before: with jQuery

After: with Cash
After: with Cash

Build an XML sitemap of XML sitemaps

June 1, 2019
0 comments Django, Python

Suppose that you have so many thousands of pages that you can't just create a single /sitemap.xml file that has all the URLs (aka <loc>) listed. Then you need to make a /sitemaps.xml that points to the other sitemap files. And if you're in the thousands, you'll need to gzip these files.

The blog post demonstrates how Song Search generates a sitemap file that points to 63 sitemap-{M}-{N}.xml.gz files which spans about 1,000,000 URLs. The context here is Python and the getting of the data is from Django. Python is pretty key here but if you have something other than Django, you can squint and mentally replace that with your own data mapper.

Generate the sitemap .xml.gz file(s)

Here's the core of the work. A generator function that takes a Django QuerySet instance (that is ordered and filtered!) and then starts generating etree trees and dumps them to disk with gzip.


import gzip

from lxml import etree


outfile = "sitemap-{start}-{end}.xml"
batchsize = 40_000


def generate(self, qs, base_url, outfile, batchsize):
    # Use `.values` to make the query much faster
    qs = qs.values("name", "id", "artist_id", "language")

    def start():
        return etree.Element(
            "urlset", xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        )

    def close(root, filename):
        with gzip.open(filename, "wb") as f:
            f.write(b'<?xml version="1.0" encoding="utf-8"?>\n')
            f.write(etree.tostring(root, pretty_print=True))

    root = filename = None

    count = 0
    for song in qs.iterator():
        if not count % batchsize:
            if filename:  # not the very first loop
                close(root, filename)
                yield filename
            filename = outfile.format(start=count, end=count + batchsize)
            root = start()
        loc = "{}{}".format(base_url, make_song_url(song))
        etree.SubElement(etree.SubElement(root, "url"), "loc").text = loc
        count += 1
    close(root, filename)
    yield filename

The most important lines in terms of lxml.etree and sitemaps are:


root = etree.Element("urlset", xmlns="http://www.sitemaps.org/schemas/sitemap/0.9")
...         
etree.SubElement(etree.SubElement(root, "url"), "loc").text = loc

Another important thing is the note about using .values(). If you don't do that Django will create a model instance for every single row it returns of the iterator. That's expensive. See this blog post.

Another important thing is to use a Django ORM iterator as that's much more efficient than messing around with limits and offsets.

Generate the map of sitemaps

Making the map of maps doesn't need to be gzipped since it's going to be tiny.


def generate_map_of_maps(base_url, outfile):
    root = etree.Element(
        "sitemapindex", xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    )

    with open(outfile, "wb") as f:
        f.write(b'<?xml version="1.0" encoding="UTF-8"?>\n')
        files_created = sorted(glob("sitemap-*.xml.gz"))
        for file_created in files_created:
            sitemap = etree.SubElement(root, "sitemap")
            uri = "{}/{}".format(base_url, os.path.basename(file_created))
            etree.SubElement(sitemap, "loc").text = uri
            lastmod = datetime.datetime.fromtimestamp(
                os.stat(file_created).st_mtime
            ).strftime("%Y-%m-%d")
            etree.SubElement(sitemap, "lastmod").text = lastmod
        f.write(etree.tostring(root, pretty_print=True))

And that sums it up. On my laptop, it takes about 60 seconds to generate 39 of these files (e.g. sitemap-1560000-1600000.xml.gz) and that's good enough.

Bonus and Thoughts

The bad news is that this is about as good as it gets in terms of performance. The good news is that there are no low-hanging fruit fixes. I know, because I tried. I experimented with not using pretty_print=True and I experimented with not writing with gzip.open and instead gzipping the files on later. Nothing made any significant difference. The lxml.etree part of this, in terms of performance, is order of maginitude marginal in comparison to the cost of actually getting the data out of the database plus later writing to disk. I also experimenting with generating the gzip content with zopfli and it didn't make much of a difference.

I originally wrote this code years ago and when I did, I think I knew more about sitemaps. In my implementation I use a batch size of 40,000 so each file is called something like sitemap-40000-80000.xml.gz and weighs about 800KB. Not sure why I chose 40,000 but perhaps not important.

Generate a random IP address in Python

June 1, 2019
0 comments Django, Python

I have a commenting system where people can type in a comment and optionally their name and email if they like.
In production, where things are real, the IP address that can be collected are all interestingly different. But when testing this manually on my laptop, since the server is running http://localhost:8000, the request.META.get('REMOTE_ADDR') always becomes 127.0.0.1. Boring! So I fake it. Like this:


import random
from ipaddress import IPv4Address


def _random_ip_address(seed):
    random.seed(seed)
    return str(IPv4Address(random.getrandbits(32)))


...
# Here's the code deep inside the POST handler just before storing 
# the form submission the database.

if settings.DEBUG and metadata.get("REMOTE_ADDR") == "127.0.0.1":
    # Make up a random one!
    metadata["REMOTE_ADDR"] = _random_ip_address(
        str(form.cleaned_data["name"]) + str(form.cleaned_data["email"])
    )

It's pretty rough but it works and makes me happy.

How I simulate a CDN with Nginx

May 15, 2019
1 comment Python, Nginx

Usually, a CDN is just a cache you put in front of a dynamic website. You set up the CDN to be the first server your clients get data from, the CDN quickly decides if it was a copy cached or otherwise it asks the origin server for a fresh copy. So far so good, but if you really care about squeezing that extra performance out you need to worry about having a decent TTL and as soon as you make the TTL more than a couple of minutes you need to think about cache invalidation. You also need to worry about preventing certain endpoints from ever getting caught in the CDN which could be very bad.

For this site, www.peterbe.com, I'm using KeyCDN which I've blogged out here: "I think I might put my whole site behind a CDN" and here: "KeyCDN vs. DigitalOcean Nginx". KeyCDN has an API and a python client which I've contributed to.

The next problem is; how do you test all this stuff on your laptop? Unfortunately, you can't deploy a KeyCDN docker image or something like that, that attempts to mimic how it works for reals. So, to simulate a CDN locally on my laptop, I'm using Nginx. It's definitely pretty different but it's not the point. The point is that you want something that acts as a reverse proxy. You want to make sure that stuff that's supposed to be cached gets cached, stuff that's supposed to be purged gets purged and that things that are always supposed to be dynamic is always dynamic.

The Configuration

First I add peterbecom.local into /etc/hosts like this:

▶ cat /etc/hosts | grep peterbecom.local
127.0.0.1       peterbecom.local origin.peterbecom.local
::1             peterbecom.local origin.peterbecom.local

Next, I set up the Nginx config (running on port 80) and the configuration looks like this:

proxy_cache_path /tmp/nginxcache  levels=1:2    keys_zone=STATIC:10m
    inactive=24h  max_size=1g;

server {
    server_name peterbecom.local;
    location / {
        proxy_cache_bypass $http_secret_header;
        add_header X-Cache $upstream_cache_status;
        proxy_set_header x-forwarded-host $host;
        proxy_cache STATIC;
        # proxy_cache_key $uri;
        proxy_cache_valid 200  1h;
        proxy_pass http://origin.peterbecom.local;
    }
    access_log /tmp/peterbecom.access.log combined;
    error_log /tmp/peterbecom.error.log info;
}

By the way, I've also set up origin.peterbecom.local to be run in Nginx too but it could just be proxy_pass http://localhost:8000; to go straight to Django. Not relevant for this context.

The Purge

Without the commercial version of Nginx (Plus) you can't do easy purging just for purging sake. But with proxy_cache_bypass $http_secret_header; it's very similar to purging except that it immediately makes a request to the origin.

First, to test that it works, I start up Nginx and Django and now I can run:

▶ curl -v http://peterbecom.local/about > /dev/null
< HTTP/1.1 200 OK
< Server: nginx/1.15.10
< Cache-Control: public, max-age=3672
< X-Cache: MISS
...

(Note the X-Cache: MISS which comes from add_header X-Cache $upstream_cache_status;)

This should trigger a log line in /tmp/peterbecom.access.log and in the Django runserver foreground logs.

At this point, I can kill the Django server and run it again:

▶ curl -v http://peterbecom.local/about > /dev/null
< Server: nginx/1.15.10
< HTTP/1.1 200 OK
< Cache-Control: max-age=86400
< Cache-Control: public
< X-Cache: HIT
...

Cool! It's working without Django running. As expected. This is how to send a "purge request"

▶ curl -v -H "secret-header:true" http://peterbecom.local/about > /dev/null
> GET /about HTTP/1.1
> secret-header:true
>
< HTTP/1.1 502 Bad Gateway
...

Clearly, it's trying to go to the origin, which was killed, so you start that up again and you get back to:

▶ curl -v http://peterbecom.local/about > /dev/null
< HTTP/1.1 200 OK
< Server: nginx/1.15.10
< Cache-Control: public, max-age=3672
< X-Cache: MISS
...

In Python

In my site, there are Django signals that are triggered when a piece of content changes and I'm using python-keycdn-api in production but obviously, that won't work with Nginx. So I have a local setting and my Python code looks like this:



# This function gets called by a Django `post_save` signal
# among other things such as cron jobs and management commands.

def purge_cdn_urls(urls):
    if settings.USE_NGINX_BYPASS:
        # Note! This Nginx trick will not just purge the proxy_cache, it will
        # immediately trigger a refetch.
        x_cache_headers = []
        for url in urls:
            if "://" not in url:
                url = settings.NGINX_BYPASS_BASEURL + url
            r = requests.get(url, headers={"secret-header": "true"})
            r.raise_for_status()
            x_cache_headers.append({"url": url, "x-cache": r.headers.get("x-cache")})
        print("PURGED:", x_cache_headers)
        return 

    ...the stuff that uses keycdn...

Notes and Conclusion

One important feature is that my CDN is a CNAME for www.peterbe.com but it reaches the origin server on a different URL. When my Django code needs to know the outside facing domain, I need to respect that. The communication between by the CDN and my origin is a domain I don't want to expose. What KeyCDN does is that they send an x-forwarded-host header which I need to take into account when understanding what outward facing absolute URL was used. Here's how I do that:


def get_base_url(request):
    base_url = ["http"]
    if request.is_secure():
        base_url.append("s")
    base_url.append("://")
    x_forwarded_host = request.headers.get("X-Forwarded-Host")
    if x_forwarded_host and x_forwarded_host in settings.ALLOWED_HOSTS:
        base_url.append(x_forwarded_host)
    else:
        base_url.append(request.get_host())
    return "".join(base_url)

That's about it. There are lots of other details I glossed over but the point is that this works good enough to test that the cache invalidation works as expected.

WebSockets vs. XHR 2019

May 5, 2019
0 comments Web development, Web Performance, JavaScript

Back in 2012, I did an experiment to compare if and/or how much faster WebSockets are compared to AJAX (aka. XHR). It would be a "protocol benchmark" to see which way was faster to schlep data back and forth between a server and a browser in total. The conclusion of that experiment was that WebSockets were faster but when you take latency into account, the difference was minimal. Considering the added "complexities" of WebSockets (keeping connections, results don't come where the request was made, etc.) it's not worth it.

But, 7 years later browsers are very different. Almost all browsers that support JavaScript also support WebSockets. HTTP/2 might make things better too. And perhaps the WebSocket protocol is just better implemented in the browsers. Who knows? An experiment knows.

So I made a new experiment with similar tech. The gist of the code is best explained with some code:


// Inside App.js

loopXHR = async count => {
  const res = await fetch(`/xhr?count=${count}`);
  const data = await res.json();
  const nextCount = data.count;
  if (nextCount) {
    this.loopXHR(nextCount);
  } else {
    this.endXHR();
  }
};

Basically, pick a big number (e.g. 100) and send that integer to the server which does this:


# Inside app.py 

# from the the GET querystring "?count=123"
count = self.get_argument("count")   
data = {"count": int(count) - 1}
self.write(json.dumps(data))

So the browser keeps sending the number back to the server that decrements it and when the server returns 0 the loop ends and you look how long the whole thing took.

Try It

The code is here: https://github.com/peterbe/sockshootout2019

And the demo app is here: https://sockshootout.app (Just press "Start!", wait and press it 2 or 3 more times)

Location, location, location

What matters is the geographical distance between you and the server. The server used in this experiment is in New York, USA.

What you'll find is that the closer you are to the server (lower latency) the better WebSocket performs. Here's what mine looks like:

My result between South Carolina, USA and New York, USA
My result between South Carolina, USA and New York, USA

Now, when I run the whole experiment all on my laptop the results look very different:

Running all locally
Running all locally

I don't have a screenshot for it but a friend of mine ran this from his location in Perth, Australia. There was no difference. If any difference it was "noise".

Same Conclusion?

Yes, latency matters most. The technique, for the benefit of performance, doesn't matter much.

No matter how fancy you're trying to be, what matters is the path the bytes have to travel. Or rather, the distance the bytes have to travel. If you're far away a large majority of the total time is sending and receiving the data. Not the time it takes the browser (or the server) to process it.

However, suppose you do have all your potential clients physically near the server, it might be beneficial to use WebSockets.

Thoughts and Conclusions

My original thought was to use WebSockets instead of XHR for an autocomplete widget. At almost every keystroke, you send it to the server and as search results come in, you update the search result display. Things like that need to be fast and "snappy". But that's not where WebSockets shine. They shine in their ability to actively await results without having a loop that periodically pulls. There's nothing wrong with WebSocket and it has its brilliant use cases.

In summary, don't bother just to get a single-digit percentage performance increase if the complexity of the code and infrastructure is non-trivial. Keep building cool stuff with WebSockets but if you expect one result per action, XHR is good enough.

Bonus

The experiment app does collect everyone's results (just the timings and IP) and I hope to find the time to process this and build graph a correlating the geographical distance compared to the difference between the two techniques. Watch this space!

By the way, if you do plan on writing some WebSocket implementation code I highly recommend Sockette. It's solid and easy to use.

Whatsdeployed rewritten in React

April 15, 2019
0 comments Web development, Python, React, JavaScript

A couple of months ago my colleague Michael @mythmon Cooper wanted to add a feature to the front-end code of Whatsdeployed and learned that the whole front-end is spaghetti jQuery code. So, instead, he re-wrote it in React. My only requirements were "Use create-react-app and no redux", i.e. keep it simple.

We also took the opportunity to rewrite some of the ways that URLs are handled. It used to be that a "short link" would redirect. For example GET /s-5HY would return 302 to Location: ?org=mozilla&repo=tecken&name[]=Dev&url[]=https://symbols.dev.mozaws.net/__version__&name[]=Stage... Basically, the short link was just an alias for a redirect. Just like those services like bit.ly or g.co. Now, the short link is a permanent fixture. The short link is included in the XHR calls to the server for getting the relevant data.

All old URLs will continue to work but now the canonical URL becomes /s/5HY/mozilla-services/tecken, for example. The :org/:repo isn't really necessary because the server knows exactly what 5HY (in this example means), but it's nice for the URL bar's memory.

Another thing that changed was how it can recognize "bors commits". When you use bors, you put a bunch of commits into a GitHub Pull Request and then ask the bors bot to merge them into master. Using "bors mode" in Whatsdeployed is optional but we believe it looks a lot more user-friendly. Here is an example of mozilla/normandy with and without bors toggled on and off.

Without "bors mode"
Without "bors mode"

With "bors mode"
With "bors mode"

Thank you mythmon!

Lastly, hopefully this will make it a lot easier to contribute. Check out https://github.com/peterbe/whatsdeployed. All you need is Python 3, a PostgreSQL, and almost any version of Node that can run create-react-apps. Ping me if you find it hard to get up and running.

Format numbers with numberWithCommas() or Number.toLocaleString()

March 5, 2019
1 comment Web development, JavaScript

In a highly unscientific survey of exactly 2 French native friends, I asked them what they think about formatting large numbers the "French way" versus doing it the "English way". In particular, if the rest of content/app is English, would it be jarring if the formatting of numbers was French. Both Adrian and Mathieu said they prefer displaying the number the French way even if the app/content is French.

If you have an English browser opening https://codepen.io/peterbe/pen/xBRGoN means it's going to display the two numbers the same way. But if you have a French locale in your browser it'll look like this:

French

Number.toLocaleString() is now universally supported so that's no longer a worry.

For years I was using a function like this:


/* http://stackoverflow.com/a/2901298 */
function numberWithCommas(x) {
  var parts = x.toString().split('.');
  parts[0] = parts[0].replace(/\B(?=(\d{3})+(?!\d))/g, ',');
  return parts.join('.');
}

numberWithCommas(100000000);
// "100,000,000"

jsperf
and it works and is reasonably fast too but it's so tempting to not use and instead stand on the shoulder-of-browsers to supply this functionality instead. But consider the alternative:


(100000000).toLocaleString();
// "100,000,000"

The thing that's always worried me is; What will someone's reaction be if the texts are in one locale and the formatting of numbers (and dates, etc!) are in another locale?

All 2 of the people I asked say they don't mind the mixing but admit that it's weird but that ultimately they prefer "their" format.

If you're non-English browser, what do you prefer? If you're a web usability expert, please, too, drop a comment to share what you think.