Filtered by JavaScript, Python

Page 25

Reset

Headsupper.io

December 5, 2015
0 comments Python, Web development, Django, JavaScript, React

tl;dr

Headsupper.io is a free GitHub webhook service that emails people when commits have the configurable keyword "headsup" in it.

Introduction

Headsupper.io is great for when you have a GitHub project with multiple people working on it and when you make a commit you want to notify other people by email.

Basically, you set up a GitHub Webhook, on pushes, to push to https://headsupper.io and then it'll parse the incoming push and its commits and look for certain things in the commit message. By default, it'll look for the word "headsup". For example, a git commit message might look like this:

fixes #123 - more juice in the Saab headsup! will require updating

Or you can use the multi-line approach where the first line is short and sweat and after the break a bit more elaborate:

bug 1234567 - tea kettle upgrade 2.1

Headsup: Next time you git pull from master, remember to run 
peep install on the requirements.txt file since this commit 
introduces a bunch of crazt dependency changes.

Git commits that come through that don't have any match on this word will simply be ignored by Headsupper.

How you use it

Maybe paradoxically, you need to authenticate with your GitHub account but that's in read-only mode and does NOT set up the Webhook for you. The reason you have to authenticate to prepare a configuration on headsupper.io is to tie the configuration to a real user.

Once you've authenticated you get the option to create your first configuration, then you have to enter at least these three piece of information:

  1. The GitHub "full name". This is the org name, slash, repo name. E.g. peterbe/django-peterbecom or mozilla/socorro.
  2. Pick a secret. Remember what you typed, because you'll need to type in this same secret when you set up the Webhook on your GitHub project's Webhooks page. (This is used to checksum and verify the source of the Webhook push)
  3. Who to send to. A list of email addresses separated with a newline or a semi-colon.

Once you've set that up, you'll need to go to your GitHub project's Setting page and enter a new Webhook and the URL you need to type in is https://headsupper.io and for the "Secret" type in that secret you used earlier. That's it!

Rules and options

The word that triggers is configurable by you. The default is headsupper. And by default, it's case insensitive. You can change that so it's case sensitive. Also, the word has to be word delimited on the left (e.g. a space or a newline character) and on the right it needs to be a space, a : or a !. So this won't match: theheadsup: or headsupper.

Other optional things you can configure are:

  • Which git branch to trigger on (by default it's master)
  • Which emails to CC when it sends
  • Which emails to BCC when it sends
  • Only send when you make a tag

That last option, Only send when a new tag is created, is interesting. I added that option because at work, we make production server releases by pushing a git tag. When a tag is pushed, all those commits are sent to the continuous deployment service which makes a server upgrade. This means you get a chance to enter a heads up message to be emailed to the people who care about new deployments going out.

How it was built

It's a mix between Django and ReactJS. The whole client-side app it built statically with Webpack in ES6. It's served as static files through Nginx. But Nginx is making an exception on all URLs that start with /api or /accounts. The /api/* it used for loading and setting JSON. The /accounts/* is used for the GitHub OAuth endpoints.

What's interesting about this the architecture is that it's using HTTP cookies. Not API tokens. Cookies are quite good in that they're established and the browser does all the automated work of keeping it secure and making each request potentially authenticated.

Here's the relevant React code and here's the relevant Django code that processes the Webhook.

The whole project is available on: https://github.com/peterbe/headsupper.

Also, I made a demo at the November Mozilla Beer and Tell.

Django forms and making datetime inputs localized

December 4, 2015
2 comments Python, Django

tl;dr

To change from one timezone aware datetime to another, turn it into a naive datetime and then use pytz's localize() method to convert it back to the timezone you want it to be.

Introduction

Suppose you have a Django form where you allow people to enter a date, e.g. 2015-06-04 13:00. You have to save it timezone aware, because you have settings.USE_TZ on and it's just many times to store things in timezone aware dates.

By default, if you have settings.USE_TZ and no timezone information is in the string that the django.form.fields.DateTimeField parses, it will use settings.TIME_ZONE and that timezone might be different from what it really should be. For example, in my case, I have an app where you can upload a CSV file full of information about events. These events belong to a venue which I have in the database. Every venue has a timezone, e.g. Europe/Berlin or US/Pacific. So if someone uploads a CSV file for the Berlin location 2015-06-04 13:00 means 13:00 o'clock in Berlin. I don't care where the server is hosted and what its settings.TIME_ZONE is. I need to make that input timezone aware specifically for Berlin/Europe.

Examples

Suppose you have settings.TIME_ZONE == 'US/Pacific' and you let the django.form.fields.DateTimeField do its magic you get something you don't want:


>>> from django.conf import settings
>>> settings.TIME_ZONE
'US/Pacific'
>>> assert settings.USE_TZ
>>> from django.forms.fields import DateTimeField
>>> DateTimeField().clean('2015-06-04 13:00')
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)

See! That's wrong. Sort of. Not Django's fault. What I need to do is to convert that datetime object into one that is timezone aware on the Europe/Berlin timezone.

In old versions of pytz, specifically <=2014.2 you could do this:


>>> import pytz
>>> pytz.VERSION
'2014.2'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'US/Pacific' PDT-1 day, 17:00:00 DST>)
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>)

But in modern versions of pytz you can't do that because if you don't use the pytz.timezone instance to localize it will use the default version which might be one of those crazy "Local Mean Time" which they used a 100 years ago. E.g.


>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> tz = pytz.timezone('Europe/Berlin')
>>> date.replace(tzinfo=tz)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>)

See, it's that crazy LMT+0:53:00 that's oft talked of on Stackoverflow!

Here's the trick

The trick is to use pytz.timezone(MY TIME ZONE NAME).localize(MY NAIVE DATETIME OBJECT). When you use the .localize() method pytz can use the date to make sure it uses the right conversion for that named timezone.

And in the case of our overly smart django.form.fields.DateTimeField it means we need to convert it back into a naive datetime object and then localize it.


>>> import pytz
>>> pytz.VERSION
'2015.7'
>>> from django.forms.fields import DateTimeField
>>> date = DateTimeField().clean('2015-06-04 13:00')
>>> date = date.replace(tzinfo=None)
>>> date
datetime.datetime(2015, 6, 4, 13, 0)
>>> tz = pytz.timezone('Europe/Berlin')
>>> tz.localize(date)
datetime.datetime(2015, 6, 4, 13, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)

That was much harder than it needed to be. Timezones are hard. Especially when you have the human element of people typing in things and just, rightfully, expect the system to figure it out and get it right.

I hope this helps the next schmuck who has/had to set aside an hour to figure this out.

Whatsdeployed

November 11, 2015
4 comments Python, Web development, Mozilla

Whatsdeployed was a tool I developed for my work at Mozilla. I think many other organizations can benefit from using it too.

So, on many sites, what we do when deploying a site, is that we note which git sha was used and write that to a file which is then exposed via the web server. Like this for example. If you know that sha and what's at the tip of the master branch on the project's GitHub page, you can build up an interesting dashboard that allows you to see what's available and what's been deployed.

Sample Whatsdeployed screen for the Mozilla Socorro project
The other really useful case is when you have more than just one environment. For example, you might have a dev, stage and prod environment and, always lastly, the master branch on GitHub. Now you can see what code has been shipped on prod versus your staging environment for example.

This is one of those far too few projects that you build quickly one Friday afternoon and it turns out to be surprisingly useful to a lot of people. I for one, check various projects like this several times per day.

The code is on GitHub and it's basically a tiny bit of Flask with some jQuery doing a couple of AJAX requests. If you enjoy it and use it, please share.

UPDATE

Blogged about a facelift, Jan 2018

Chainable catches in a JavaScript promise

November 5, 2015
6 comments Web development, JavaScript

If you have a Promise that you're executing, you can chain multiple things quite nicely by simply returning the value as it "passes through".
For example:


new Promise((resolve) => {
  resolve('some value')
})
.then((value) => {
  console.log('1', value)
  return value
})
.then((value) => {
  console.log('2', value)
  return value
})

This will console log

1 some value
2 some value

And you can add more .then() to it. As many as you like. Just remember to "play ball" by passing the value. In fact, you can actually pass a different value. Like this for example:


new Promise((resolve) => {
  resolve('some value')
})
.then((value) => {
  console.log('1', value)
  return value
})
.then((value) => {
  console.log('2', value)
  return value.toUpperCase()
})
.then((value) => {
  console.log('3', value)
  return value
})

Demo here. This'll console log

1 some value
2 some value
3 SOME VALUE

But how do you do the same with multiple .catch()?

This is NOT how you do it:


new Promise((resolve, reject) => {
  reject('some reason')
})
.catch((reason) => {
  console.warn('1', reason)
  return reason
})
.catch((reason) => {
  console.warn('2', reason)
  return reason
})

Demo here. When you run that you just get:

1 some reason

To chain catches you have to re-raise (aka re-throw) it:


new Promise((resolve, reject) => {
  reject('some reason')
})
.catch((reason) => {
  console.warn('1', reason)
  throw reason
})
.catch((reason) => {
  console.warn('2', reason)
})

Demo here. The output if you run this is:

1 some value
2 some value

But you have to be a bit more careful here. Note that in the second .catch() it doesn't re-throw the reason one last time. If you do that, you get a general JavaScript error on that page. I.e. an unhandled error that makes it all the way out to the web console. Meaning, you have to be aware of errors and take care of them.

Why does this matter?

It matters because you might want to have a, for example, low level and a high level dealing with errors. For example, you might want to log all exceptions AND still pass them along so that higher level code can be aware of it. For example, suppose you have a function that fetches data using the fetch API. You use it from multiple places and you don't want to have to log it everywhere. Instead, that wrapping function can be responsible for logging it but you still have to deal with it.

For example, this is contrived by not totally unrealistic code:


let fetcher = (url) => {
  // this function might be more advanced
  // and do other fancy things
  return fetch(url)
}

// 1st
fetcher('http://example.com/crap')
.then((response) => {
  document.querySelector('#result').textContent = response
})
.catch((exception) => {
  console.error('oh noes!', exception)
  document.querySelector('#result-error').style['display'] = 'block'
})

// 2nd
fetcher('http://example.com/other')
.then((response) => {
  document.querySelector('#other').textContent = response
})
.catch((exception) => {
  console.error('oh noes!', exception)
  document.querySelector('#other-error').style['display'] = 'block'
})

Demo here

Notice how each .catch() handler does the same kind of logging but deals with the error in a human way differently.
Wouldn't it be nice if you could have a general and central .catch() for logging but continue dealing with the errors in a human way?

Here's one such example:


let fetcher = (url) => {
  // this function might be more advanced
  // and do other fancy things
  return fetch(url)
  .catch((exception) => {
    console.error('oh noes! on:', url, 'exception:', exception)
    throw exception
  })
}

// 1st
fetcher('http://example.com/crap')
.then((response) => {
  document.querySelector('#result').textContent = response
})
.catch(() => {
  document.querySelector('#result-error').style['display'] = 'block'
})

// 2nd
fetcher('http://example.com/other')
.then((response) => {
  document.querySelector('#other').textContent = response
})
.catch(() => {
  document.querySelector('#other-error').style['display'] = 'block'
})

Demo here

Here you get the best of both worlds. You have a central place where all exceptions are logged in a nice way, and the higher level code only has to deal with the human way of explaining that something went wrong.

It's pretty basic but it's probably useful to somebody else who gets confused about how to deal with exceptions in promises.

How to "onchange" in ReactJS

October 21, 2015
28 comments JavaScript, React

Normally, in vanilla Javascript, the onchange event is triggered after you have typed something into a field and then "exited out of it", e.g. click outside the field so the cursor isn't blinking in it any more. This for example


document.querySelector('input').onchange = function(event) {
  document.querySelector('code').textContent = event.target.value;
}

First of all, let's talk about what this is useful for. One great example is a sign-up form where you have to pick a username or type in an email address or something. Before the user gets around to pressing the final submit button you might want to alert them early that their chosen username is available or already taken. Or you might want to alert early that the typed in email address is not a valid one. If you execute that kind of validation on every key stroke, it's unlikely to be a pleasant UI.

Problem is, you can't do that in ReactJS. It doesn't work like that. The explanation is quite non-trivial:

*"<input type="text" value="Untitled"> renders an input initialized with the value, Untitled. When the user updates the input, the node's value property will change. However, node.getAttribute('value') will still return the value used at initialization time, Untitled.

Unlike HTML, React components must represent the state of the view at any point in time and not only at initialization time."*

Basically, you can't easily rely on the input field because the state needs to come from the React app's state, not from the browser's idea of what the value should be.

You might try this


var Input = React.createClass({
  getInitialState: function() {
    return {typed: ''};
  },
  onChange: function(event) {
    this.setState({typed: event.target.value});
  },
  render: function() {
    return <div>
        <input type="text" onChange={this.onChange.bind(this)}/>
        You typed: <code>{this.state.typed}</code>
      </div>
  }
});
React.render(<Input/>, document.querySelector('div'));

But what you notice is the the onChange handler is fired on every key stroke. Not just when the whole input field has changed.

So, what to do?

The trick is surprisingly simple. Use onBlur instead!

Same snippet but using onBlur instead


var Input = React.createClass({
  getInitialState: function() {
    return {typed: ''};
  },
  onBlur: function(event) {
    this.setState({typed: event.target.value});
  },
  render: function() {
    return <div>
        <input type="text" onBlur={this.onBlur.bind(this)}/>
        You typed: <code>{this.state.typed}</code>
      </div>
  }
});
React.render(<Input/>, document.querySelector('div'));

Now, your handler is triggered after the user has finished with the field.

localStorage is not async, but it's FAST!

October 6, 2015
7 comments Web development, AngularJS, JavaScript

A long time I go I wrote an angular app that was pleasantly straight forward. It loads all records from the server in one big fat AJAX GET. The data is large, ~550Kb as a string of JSON, but that's OK because it's a fat-client app and it's extremely unlikely to grow any multiples of this. Yes, it'll some day go up to 1Mb but even that is fine.

Once ALL records are loaded with AJAX from the server, you can filter the whole set and paginate etc. It feels really nice and snappy. However, the app is slightly smarter than that. It has two cool additional features...

  1. Every 10 seconds it does an AJAX query to ask "Have any records been modified since {{insert latest modify date of all known records}}?" and if there's stuff, it updates.

  2. All AJAX queries from the server are cached in the browser's local storage (note, I didn't write localStorage, "local storage" encompasses multiple techniques). The purpose of that is to that on the next full load of the app, we can at least display what we had last time whilst we wait for the server to return the latest and greatest via a slowish network request.

  3. Suppose we have brand new browser with no local storage, because the default sort order is always known, instead of doing a full AJAX get of all records, it does a small one first: "Give me the top 20 records ordered by modify date" and once that's in, it does the big full AJAX request for all records. Thus bringing data to the eyes faster.

All of these these optimization tricks are accompanied with a flash message at the top that says: <img src="spinner.gif"> Currently using cached data. Loading all remaining records from server....

When I built this I decided to use localForage which is a convenience wrapper over localStorage AND IndexedDB that does it all asynchronously and with proper promises. And to make it work in AngularJS I used angular-localForage so it would work with Angular's cycle updates without custom $scope.$apply() stuff. I thought the advantage of this is that it being async means that the main event can continue doing important rendering stuff whilst the browser saves things to "disk" in the background.

Also, I was once told that localStorage, which is inherently blocking, has the risk that calling it the first time in a while might cause the browser to have to take a major break to boot data from actual disk into the browsers allocated memory. Turns out, that is extremely unlikely to be a problem (more about this is a future blog post). The warming up of fetching from disk and storing into the browser's memory happens when you start the browser the very first time. Chrome might be slightly different but I'm confident that this is how things work in Firefox and it has for many many months.

What's very important to note is that, by default, localForage will use IndexedDB as the storage backend. It has the advantage that it's async to boot and it supports much large data blobs.

So I timed, how long does it take for localForage to SET and GET the ~500Kb JSON data? I did that like this, for example:


var t0 = performance.now();
$localForage.getItem('eventmanager')
.then(function(data) {
    var t1 = performance.now();
    console.log('GET took', t1 - t0, 'ms');
    ...

The results are as follows:

Operation Iterations Average time
SET 4 341.0ms
GET 4 184.0ms

In all fairness, it doesn't actually matter how long it takes to save because my app actually doesn't depend on waiting for that promise to resolve. But it's an interesting number nevertheless.

So, here's what I did. I decided to drop all of that fancy localForage stuff and go back to basics. All I really need is these two operations:


// set stuff
localStorage.setItem('mykey', JSON.stringify(data))
// get stuff
var data = JSON.parse(localStorage.getItem('mykey') || '{}')

So, after I've refactored my code and deleted (6.33Kb + 22.3Kb) of extra .js files and put some performance measurements in:

Operation Iterations Average time
SET 4 5.9ms
GET 4 3.3ms

Just WOW!
That is so much faster. Sure the write operation is now blocking, but it's only taking 6 milliseconds. And the reason it took IndexedDB less than half a second also probably means more hard work for it to sweat CPU over.

Sold? I am :)

Using Lovefield as an AJAX proxy maybe

September 30, 2015
1 comment Web development, JavaScript

Lovefield, by Arthur Hsu at Google, is a cool little Javascript browser abstraction on top of IndexedDB. IndexedDB is this amazingly powerful asynchronous framework for storing data in the browser tied to the domain you're visiting. Unlike it's much "simpler" sibling localStorage, with IndexedDB you can store individual keys in a schema, use indexes for faster retrieval, asynchronous querying and much larger memory capacity than general DOM storage (e.g. localStorage and sessionStorage).

What Lovefield brings is best described by watching this video. But to save you time let me try...:

  • "Lovefield is a relational database for web apps"
  • Adds structural query capability to IndexedDB without having to work with any callbacks
  • It's not plain SQL strings that get executed but instead something similar to an ORM (e.g. SQLAlchemy)
  • Supports indexes for speedier lookups
  • Supports doing joins across different "tables"
  • Works in IE, Chrome, Firefox and Safari (although I don't know about iOS Safari)

Anyway, it sounds really cool and I'm looking forward to using it for something. But before I do I thought I'd try using it as a "AJAX proxy".

So what's an AJAX proxy, you ask. Well, it can mean many things but what I have in mind is a pattern where a web app's MVC is tied to a local storage and the local storage is tied to AJAX. That means two immediate benefits:

  • The MVC part of the web app is divorced from understanding network APIs.
  • The web app becomes offline capable to boot. Being able to retrieve fresh data from the network (or send!) is a luxury that you automatically get if you have a network connection.

Another subtle benefit is a "corner case" of that offline capability; when you load up the app you can read from local storage much much faster than from the network meaning you can display user data on the screen sooner. (With the obvious caveat that it might be stale and that the data will change once the network read is completed later)

So, to take this idea for a spin (the use of a local storage to remember the loaded data last time first) I extended my AJAX or Not playground with a hybrid that uses React to render the data, but render the data from Lovefield (and from localStorage too). Remember, it's an experiment so it's a bit clunky and perhaps contrieved. The objective is to notice how soon after loading the page, that data because available for your eyes to consume.

Here is the playground test page

You have to load it at least once to fill your IndexedDB with some data from an AJAX request. Then, reload the page and it'll display what it has locally (in IndexedDB extracted with the Lovefield API). Then, after it's loaded, try refreshing the browser repeatedly. With or without a Wifi connection.

Basically, it works. However, perhaps I've chosen the worst possible test bed for playing with Lovefield. Because it is super slow. If you open the web console, you'll see it reports how long it takes to extract the data out of Lovefield. The code looks like this:


...
getPostsFromDB: function() {
  return schemaBuilder.connect().then(function(db) {
    var table = db.getSchema().table('post');
    return db.select().from(table).exec();
  });
},
...
var t0 = performance.now();
this.getPostsFromDB()
.then(function(results) {
  var t1 = performance.now();
  console.log(results.length, 'records extracted');
  console.log((t1 - t0).toFixed(2) + 'ms to extract');
  ...

You can see the source here in full.

So out of curiousity, I forked this experiment. Kept almost all the React code but replaced the Lovefield stuff with good old JSON.parse(localStorage.getItem('posts') || '[]'). See code here.
This only takes 1-2 milliseconds. Lovefield repeatedly takes about 400-550 milliseconds on my Firefox version 43.

By the way, load up the localStorage fork and after a first load, try refreshing it over and over and notice how amazingly fast it is. Yay localStorage!

ElasticSearch, snowball analyzer and stop words

September 25, 2015
1 comment Python

Disclaimer: I'm an ElasticSearch noob. Go easy on me

I have an application that uses ElasticSearch's more_like_this query to find related content. It basically works like this:

>>> index(index, doc_type, {'id': 1, 'title': 'Your cool title is here'})
>>> index(index, doc_type, {'id': 2, 'title': 'About is a cool headline'})
>>> index(index, doc_type, {'id': 3, 'title': 'Titles are your big thing'})

Then you can pick one ID (1, 2 or 3) and find related ones.
We can tell by looking at these three silly examples, the 1 and 2 have the words "is" and "cool" in common. 1 and 3 have "title" (stemming taken into account) and "your" in common. However, is there much value in connected these documents on the words "is" and "your"? I think not. Those are stop words. E.g. words like "the", "this", "from", "she" etc. Basically words that are commonly used as "glue" between more unique and specific words.

Anyway, if you index something in ElasticSearch as a text field you get, by default, the "standard" analyzer to analyze the incoming stuff to be indexed. The standard analyzer just splits the words on whitespace. A more compelling analyzer is the Snowball analyzer (original here) which supports intelligent stemming (turning "wife" ~= "wives") and stop words.

The problem is that the snowball analyzer has a very different set of stop words. We did some digging and thought this was the list it bases its English stop words on. But this was wrong. Note that that list has words like "your" and "about" listed there.

The way to find out how your analyzer treats a string and turns it into token is to the the _analyze tool. For example:

curl -XGET 'localhost:9200/{myindexname}/_analyze?analyzer=snowball' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 5,
      "token": "about",
      "type": "<ALPHANUM>",
      "start_offset": 0,
      "position": 1
    },
    {
      "end_offset": 10,
      "token": "your",
      "type": "<ALPHANUM>",
      "start_offset": 6,
      "position": 2
    },
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

So what you can see is that it finds the tokens "about", "your", "special" and "word". But it stop word ignored "is", "a" and "the". Hmm... I'm not happy with that. I don't think "about" and "your" are particularly helpful words.

So, how do you define your own stop words and override the one in the Snowball analyzer? Well, let me show you.

In code, I use pyelasticsearch so the index creation is done in Python.


STOPWORDS = (
    "a able about across after all almost also am among an and "
    "any are as at be because been but by can cannot could dear "
    "did do does either else ever every for from get got had has "
    "have he her hers him his how however i if in into is it its "
    "just least let like likely may me might most must my "
    "neither no nor not of off often on only or other our own "
    "rather said say says she should since so some than that the "
    "their them then there these they this tis to too twas us "
    "wants was we were what when where which while who whom why "
    "will with would yet you your".split()
)

def create():
    es = get_connection()
    index = get_index()
    es.create_index(index, settings={
        'settings': {
            'analysis': {
                'analyzer': {
                    'extended_snowball_analyzer': {
                        'type': 'snowball',
                        'stopwords': STOPWORDS,
                    },
                },
            },
        },
        'mappings': {
            doc_type: {
                'properties': {
                    'title': {
                        'type': 'string',
                        'analyzer': 'extended_snowball_analyzer',
                    },
                }
            }
        }
    })

With that in place, now delete your index and re-create it. Now you can use the _analyze tool again to see how it analyzes text on this particular field. But note, to do this we need to know the name of the index we used. (so replace {myindexname} in the URL):

$ curl -XGET 'localhost:9200/{myindexname}/_analyze?field=title' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 18,
      "token": "special",
      "type": "<ALPHANUM>",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "<ALPHANUM>",
      "start_offset": 28,
      "position": 7
    }
  ]
}

Cool! Now we see that it considers "about" and "your" as stop words. Much better. This is handy too because you might have certain words that are globally not very common but within your application it's very repeated and not very useful.

Thank you willkg and Erik Rose for your support in tracking this down!

django-semanticui-form

September 14, 2015
2 comments Python, Django

I'm working on a (side)project in Django that uses the awesome Semantic UI CSS framework. This project has some Django forms that are rendered on the server and so I can't let Django render the form HTML or else the CSS framework can't do its magic.

The project is called django-semanticui-form and it's a fork from django-bootstrap-form.

It doesn't come with the Semantic UI CSS files at all. That's up to you. Semantic UI is available as a big fat bundle (i.e. one big .css file) but generally you just pick the components you want/need. To use it in your Django templates simply, create a django.forms.Form instance and render it like this:


{% load semanticui %}

<form>
  {{ myform | semanticui }}
</form>

The project is very quickly put together. The elements I intend to render seem to work but you might find that certain input elements don't work as nicely. However, if you want to help on the project, it's really easy to write tests and run tests. And Travis and automatic PyPI deployment is all set up so pull requests should be easy.

peepin - a great companion to peep

September 10, 2015
0 comments Python

I actually wrote peepin several months ago but forgot to blog about it.
It's a great library that accompanies peep which is a wrapper on top of pip. Actually, it's for pip install. When you normally do pip install -r requirements.txt the only check it does is on the version number, assuming your requirements.txt has lines in it like Django==1.8.4. With peep it does a checksum comparison of the wheel, tarball or zip file. It basically means that the installer will get EXACTLY the same package files as was used by the developer who decides to add it to requirements.txt.

If you're using pip and want strong reliability and much higher security, I strongly recommend you consider switching to peep.

Anyway, what peepin is, is a executable use to modify your requirements.txt automatically for you. It can do two things. At least one.

1) Automatically figure out what the right checksums should be.
2) It can figure out what is the latest version on PyPI.

For example:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form $)$ peepin --verbose django-bootstrap-form
* Latest version for 3.2
https://pypi.python.org/pypi/django-bootstrap-form/3.2
* Found URL https://pypi.python.org/packages/source/d/django-bootstrap-form/django-bootstrap-form-3.2.tar.gz#md5=1e95b05a12362fe17e91b962c41d139e
*   Re-using /var/folders/1x/2hf5hbs902q54g3bgby5bzt40000gn/T/django-bootstrap-form-3.2.tar.gz
*   Hash AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
* Editing requirements.txt

And once that's done...:

(airmozilla):~/airmozilla (upgrade-django-bootstrap-form *$)$ git diff
diff --git a/requirements.txt b/requirements.txt
index a6600f1..5f1374c 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -83,8 +83,8 @@ BeautifulSoup==3.2.1
 django_compressor==1.4
 # sha256: F3KVsUQkAMks22fo4Y-f9ZRvtEL4WBO50IN4I3IuoI0
 django-cronjobs==0.2.3
-# sha256: 2G3HpwzvCTy3dc1YE7H4XQH6ZN8M3gWpkVFR28OOsNE
-django-bootstrap-form==3.1
+# sha256: AV1uiepPkO_mjIg3AvAKUDzsw82lsCCLCp6J6q_4naM
+django-bootstrap-form==3.2
 # sha256: jiOPwzhIDdvXgwiOhFgqN6dfB8mSdTNzMsmjmbIBkfI
 regex==2014.12.24
 # sha256: ZY2auoUzi-jB0VMsn7WAezgdxxZuRp_w9i_KpCQNnrg
 

If you want to you can open up and inspect the downloaded package and check that no hacker has meddled with the package. Or, if you don't have time to do that, at least use the package locally and run your tests etc. If you now feel comfortable with the installed package you can be 100% certain that will be installed on your server once the code goes into production.