What I learned today: Not All Methods Are Created Equal

Today I wrote a script in Ruby that, among other things, receives a text file as input and calculates the word count. One of the things I love about Ruby is that there’s always more than one method for doing things. But what I learned today is, not all methods are created equal. Let’s say we wanted to count the words in the following text:

What an excellent example of the power of dress, young Oliver
Twist was! Wrapped in the blanket which had hitherto formed his
only covering, he might have been the child of a nobleman or a
beggar; it would have been hard for the haughtiest stranger to
have assigned him his proper station in society. But now that he
was enveloped in the old calico robes which had grown yellow in
the same service, he was badged and ticketed, and fell into his
place at once–a parish child–the orphan of a workhouse–the
humble, half-starved drudge–to be cuffed and buffeted through
the world–despised by all, and pitied by none.

One way to accomplish this task is by using the scan method and passing the regular expression /\w+/ as an argument. scan iterates over a string and looks for a certain pattern passed to it as an argument, then outputs any matches into an array. So let’s say we store the Oliver Twist text above in a variable named text, and use the scan method to search for any word character using regular expressions, then ask to return the number of words found. Here’s how that would look:

Ruby scan method

Here, the scan method searched for all alphanumeric characters then returned the results into an array. The length method returns the number of words found. In this case, 113 words.

Another method we can use to count how many words are stored in the text variable is to use the split method. When no arguments are passed to the split method, it will automatically split the string by whitespace and return the results in an array. Passing the length method to that result will also return the number of words stored in the text variable. This is what that would look like:

Ruby split method

The split method returned only 107 words. Do you know why this may be? The reason is that by passing the regular expression, /\w+/, as an argument, the scan method counted the hyphenated words as two words, when they should have only been counted as one. So it seems to me that using split can provide a more accurate method to determine word count.

What do you think? Do you agree that using the split method can provide a more accurate word count, or can you use regular expressions to achieve the same result? Leave a comment below, I’d love to hear your thoughts.

Tagged , , , ,

2 thoughts on “What I learned today: Not All Methods Are Created Equal

  1. nserror says:

    The double-hyphenated words are rightly two words, signifying an appositive phrase. Both the scan and split methods are a valid means of word count, scan perhaps a little more so. If you want a more bullet-proof word counter, it is more likely that you’ll need to program a complex grammar, complete with the various exceptions that are rife in English. For example, “cul-de-sac” is considered one word, but “African-American” is two words (depending on your English teacher, of course). You’ve just got to define the scope of how accurate you want your code to be. Natural language parsing is considered an NP-complete problem, so I wouldn’t sweat it too much.

    • dannygarciame says:

      I’ve read differing opinions online and the majority of what I found concluded that hyphenated words should be counted as one, but your point actually makes more sense to me. I should have also considered the myriad of exceptions there are in the English language. Simply using scan or split wouldn’t be accurate enough but for the purposes of learning Ruby, I guess they suffice. Learned something new. Thanks a lot for your input! As someone learning how to code, it’s definitely appreciated!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: