Sunday, June 29, 2014

Denouement

It's been fun. I've learned a bit about computer science, and quite a lot about how people work. I'm glad that I've recorded so much of my thought process, because I've learned things that I didn't even realize at the time, because I caught up to much in the compelling narrative I was trying to weave about myself. I highly recommend trying to make your own blog, even if nobody reads it. I also recommend sometimes putting much more care into things than neccessary, if only to see what happens.

If for some reason you'd like to contact me, you can reach me at paulproteus47 at gmail dot com.

Peace.

Monday, March 31, 2014

Understanding Efficiency

In this post, I'm going to focus on how to talk about efficiency of sorting algorithms. It seems that a lot of people are having trouble with it. Myself, I just think of it as math, and I find that I do well (I wonder whether the difficulties of others are due to them thinking of these problems as requiring guessing, but that's just my conjecture). Here, I present the subject of efficiency the way I would present it to myself if I were confused.

First, I'm going to use my favourite function as an example:

def string_together(x: 'list of str') -> str:
if len(x) == 2:
return x[0] + x[1]
else:
return string_together(y[:-1]) + y[-1]

Now, I want to analyze this mathematically. I'm going to assign variables for the amount of time that a computer will take to do a certain step:

X: Determining whether len(x) is 2. (call this 'step 1')
Y: Concatinating and returning x[0] + x[1]. (call this 'step 2')
Z: Concatenating and returning string_together(y[:-1]) + y[-1] (call this 'step 3')

Now, I do some arithmetic. To make notation easier, I'll name the initial length of x as n, and the total time taken as f(n). Lets require that n > 0 (only because of the way the function is written).

Lets say that len(x) is 2, the base case.
Then step 1 will be executed once, and step 2 will be executed once.
And so, the total time, f(2) is equal to X + Y.

Now, let len(x) be 3.
Step 1 will be executed, then step 3, and then there's a recursive call which will contain all of the steps in the above example (so, 1 and then 2).
Total time: f(3) = X +Z + X + Y = 2X + Y + Z

Now, let len(x) be 4.
Step 1 will be executed, then step 3, and then there's a recursive call which will contain all of the steps in the above example (1,3,1,2).
f(4) = X + Z + 2X + Y + Z = 3X + Y + 2Z

...

There's a pattern here. I generalize it as:
f(n) = (n-1)X + Y + (n-2)Z

And this is useful. For starters, if you want to find out the actual values of X, Y and Z, you can run a bunch of simulations, and do the linear regression (if you're into that sort of thing =p).

Question: When is f(n) large? (aka, when is the function slow)
Answer: f(n) is large when X, Y, Z, or n are large.
The thing is, I have some control over X, Y and Z. You know, because I've written my function to make them one line each. I don't expect them to vary a lot. I expect them to be small (but non-zero).
But n -- n could get as large as it wants.

New question: Can f(n) also get as large as it wants?
Answer: Yes. If you pick a number, I can choose an n big enough that f(n) will be greater than that number.
BUT, that said, there is a constraint on f. It's n. (and also X, Y and Z of course)

Question: So, what's a good way of expressing that f is constrained by n while still acknowledging that f is unbounded?
Answer: I can say that, given X, Y and Z,
I can pick some number C, and some easier-to-understand function g(n), such that no matter what number we try for n, it will always be true that C times g of n will be greater than f of n.

Or, for the mathematically inclined:

∃C∀nℕ\{1}, Cg(n) > f(n)
(Note that here, I proscribe n from taking non-integer values or values less than two)
(Gosh, this is ugly. I should learn LaTeX)

Anyways, to further simplify the notation, we say that the above is equivalent to the statement:

f is O(g(n)).

Let's return to the example.
f(n) = (n-1)X + Y + (n-2)Z

= (X + Z)n + (Y - X - 2Z)
So let g(n) = n, and let C = X + Z + |Y - X - 2Z | + 1.
Then
Cg(n) - f(n) = [(X + Z + |Y - X - 2Z + 1|)n] - [(X + Z)n + (Y - X - 2Z)]
= (|Y - X - 2Z | + 1)n - (Y - X - 2Z)
= (Y - X - 2Z + 1)n - (Y - X - 2Z) OR = (-Y + X + 2Z + 1)n - (Y - X - 2Z)
[Here, I split the problem based on whether Y - X - 2Z is positive or negative.]
= (Y - X - 2Z )(n - 1) + n || = (Y - X - 2Z)(1 - n) + n
> n [both factors are positive] || > n [both factors are negative]
> 0 || > 0
Therefore, Cg(n) > f(n) for all n; f is O(n).

That was fun. 
The moral of the story: you only have to care about terms that make f larger as n gets larger. 
And, though I haven't quite shown this: if there are more than one, you only have to care about the one the grows fastest with large n.

What about sorting a list, then? What do I know about sorting algorithms?

Well, sorting algorithms operate by comparing values to see which is larger, then moving them if appropriate.
This moving actually complicates this problem, because there's a practical difference between sorting a list in place, and making a sorted copy, I'm going to ignore this here.

Further, a single algorithm, might take different times for different lists, given that more operations might be required. An easy way to think about this: an already-sorted list does not require any movement to be sorted. 

I'm going to consider some possibilities...

What about some magical, ideal sorting algorithm? Well, it's going to have to do at least n - 1 comparisons -- just enough to check that an already-sorted list is sorted, by checking each element against the next one. 
In the best case, it won't have do to any moving, because the list will already be sorted.
From this, we see that this magical, unfettered algorithm will take n -1 of these steps, and so will be O(n).

What about an equally contrived, magically-awful algorithm? Well, it shouldn't have to do any single comparison more than once. So, let's say it does all possible comparisons. That's (n - 1) for the first element, (n - 2) for the second, and so on. Overall, that's n(n - 1)/2. 
In the worst case, each element will have to be moved individually to its spot -- except for the final one. In this case, you need n - 1 movements. Hm, maybe that's not bad enough, actually. Ok, how about every time you move one item, you have to move all the other unsorted items, too. Then this also takes n(n - 1)/2.
From this, this algorithm will take n(n-1) = n^2 - n of these steps, and so will be O(n^2).

So you see, without even considering any specific algorithms, we can expect all reasonably efficient algorithms to operate between O(n) and O(n^2). Further, these cases suggest that comparisons will generally be more important than the actual moving (not guaranteed).

[Author's note: please don't confuse my example of good//bad algorithms with the concept of best//worst cases for a single algorithm.]

Last question: I've done some reading on the subject, and O(nlog(n)) keeps coming up. Why?
Answer: Let's look at a real example.

Consider mergesort. For this algorithm, divide the list into two, recursively sort the halves, then combine the two in order. How many comparisons does it do? Approach this question the way you'd approach any question about the behaviour of a recursive function.

Base case: For n = 2, only one comparison is needed.

For n = 4, the group is split in half, then 1 comparison is done in each group, then 2-3 additional comparisons are required to merge them. (To merge, take the lowest element from each group, take the lowest of them as the lowest overall, and repeat. This can't take fewer comparisons than the number of items in a group, but it also can't require as many as the number of items in the two combined.) A total of 4-5 comparisons are required overall.

For n = 8, the group is split in half, and 4-5 comparisons are done in each half. Then merging takes 4-7 comparisons, for a total of 12-17.

...

In conclusion, the best cases take {1,4,12,...} and the worst cases take {1,5,17,...}.
I can describe these by functions which I'll call b(x) and w(x), for best and worst. I understand them best as they operate on powers of two, so that's how I'll write them. For N = 1,2,3,... :

Recursively:
b(2^N) = 2*b(2^(N - 1)) + 2^(N-1)
w(2^N) = 2*w(2^(N - 1)) + 2^(N) - 1

After some math:
Well, actually, I can't do the math. But look at the patterns:
b(2^N) gives {1, 1*2+2, 4*2 + 4, 12*2 + 8,...} 
                       = {1*1, 2*2, 4*3, 8*4,...}
w(2^N) gives {1, 1*2 + 4 - 1, 5*2 + 8 - 1, 17*2 + 16 - 1,...} 
                       = {2*0 + 1, 4*1 + 1, 8*2 + 1, 16*3 + 1,...}

Which leads me to a conclusion:
b(2^N) = ((2^N)/2)*N
w(2^N) = (2^N)*(N - 1) + 1

Let n = 2^N. Then N = log(n). And:
b(n) = (n/2)*log(n) = (1/2)*nlog(n)
w(n) = n(log(n) - 1) + 1 = nlog(n) - n + 1


And so b(n) and w(n) are both O(nlog(n)). Tada!
[Author's note: an earlier version had 'O(n)' here.  That was merely a clerical error, sorry for any confusion.]

But why???
In a nutshell, because log(n) is the time to find an item by binary search (this is because binary search eliminates half of all possibilities with each step, because doubling the number of elements to search will add only one more step, etc, etc), and n is the number of items to find. Mergesort doesn't quite have this cut-and-dry ~find one item, then find the next~ -style, but I'm now wondering if they aren't really very similar. But I'm fishing here. However I do know that the reasons are similar. Hmmm...


That's all for today, folks. I hope your midterms went well, and I hope you're not afraid of your finals.

Until next time, may you ask a bunch of questions, do the math and then ask more questions.

--David

Sunday, March 23, 2014

Marking

Here's the deal. I've found a hole in the course.

Visit this. Notice how the question has been marked as "good" by multiple people, but the instructors' answer was not. Also notice how lots of students have asked follow-up questions, none of which have been addressed. And it's been a month.

Here is the SLOG handout and the complementary (and difficult to find) rubric. I'm going to go over the main points that they make and try to clarify them for everyone. In addition I will try to relate them to motifs that I've been using throughout my slog, to (mainly for my own benefit) pick out things that I've done that are on the right track.

Dear readers: if you think I've missed something, or got something wrong, you've got to let me know in the comments so that we might see eye-to-eye. If I don't hear from you, and you've read this, I'm going to assume you agree with me -- I'm doing this here since feedback is obviously important when it comes to course structure. Ready? Here we go. To be honest, I should have done this a long time ago.


The Handout:
  • Record "milestones" in the course. I assume this refers to assignments and tests, etc. Sounds fine.
  • Record difficulties, with the steps you've taken toward resolving them. No pity-parties. Got it.
  • Record reaction to course material. This could suggest a couple of things. Extensions of class concepts seems like the obvious choice to talk about. Difficulties fall under this bracket also (if you have an unresolved difficulty, I would say you might have a reaction to it.
  • "Anything else you decide is relevant". So, anything we want (given that the relevance is up to our discretion).
  • Write for the teaching staff. Give them something relevant to them -- they want to know how you're doing, so let them know.
  • Write for your peers. Give them something relevant to them -- I've already talked about this at length here.
  • Write for your future self. Give something relevant to your future self -- that is, make sure to record your thought processes so that, as they change, you'll be able to pick out the differences between now and then.
  • Write one or two paragraphs on the assigned topics; there are three topics total.
  • Write weekly. These slogs are actually meant to be visited. Readers don't like it when they leave for a month and nothing happens.
  • Answer some cookie-cutter questions if you're stuck. This is not particularly useful advice, given that you have already been asked to write about "anything else you decide is relevant".
  • Pay attention to readability. That is, be reader-friendly.
  • Allow comments. Respond to commentors. Obviously.

The Rubric:
  • Keep consistent, dense, well-organized "records". They're not too clear what "records" refers to, but I think it follows that they are our slog posts. In any case: be consistent, post frequently, and don't try to confuse your readers.
  • Give an account of your encounters with the course material; offer your personal reactions, as well as the methods you used to begin to deal with those reactions. The emphasis being on the response to our response, I believe they are looking for an exposition of critical thinking skills within the context of this course. Sounds reasonable. So, no diaries, no pity-parties. Also, though they don't demand a focus on the course material, they do insist that it be there.
  • Your writing style should only make your slog better. Those readers, they're important!
  • Allow for and respond to feedback; promote awareness of others' slogs. These are not difficult things to do. Obviously they are not asking for the kind of focus on slog awareness that I have in mine (though they don't proscribe it, either). The idea, I believe, is that we're part of a community: a large part of our audience is the other students, so we should be able to actually reach them. (Note: this page does not seem to me like a good way to promote this outreach.)
  • On the assigned topics: go beyond lecture material, be clear, be organized. So, no checklisting, and no confusing your readers. Again, they're not demanding a focus on course material, they're looking for critical-thinking skills. Here I'll offer my own, broad approach: introduce the concept as briefly as possible, ask a question about the topic or one which you can use the topic to explore, and then provide some possible answers for the question.

The Piazza Discussion:
  • Student: "I think everyone could use more feedback". Feedback is clearly a good thing, with the community structure and the requirement of allowed comments. Myself, I'm not getting very much of it. I could agree with this guy.
  • Professor: "The handout says you should record your reaction to concepts in the course, and presumably that includes explaining your understanding of those concepts." Did anybody else get this? In fact, it was the handouts themselves that just lead me to the conclusion that checklisting is bad (from the rubric: "regurgitating reading or lecture material" will result in a 50%  -- presumably those contain explanations of the concepts). Actually, I even wonder if his answer here might be wrong, that the slogs are not actually marked based on this sentiment, given what I've read in the course documents. I'm going to say that the jury's still out on this one. 
  • Professor: "I would be happier with an average of 75% than 50%." I'm glad we're on the same page here.
  • Professor: "Computer Science is more than hacking.  You'll need to communicate about our discipline..." This. This is important. I have no doubt that this is the philosophy behind getting first-years to blog about their studies.
  • Student: "I... think that the marking for our blogs is unfair... [the feedback] suggests me to 'try to provide definitions of concepts...' Not sure about what the rest of the class feel, but I really don't agree with this suggestion.... I'm not writing a textbook." This is the proclamation of a strong opinion. I have no idea why none of the professors or TAs decided to respond to it. I, personally, would like to read a response.
  • Student: "I don't understand why we need to define certain things that can be found with a tiny bit of research.  If we explain our feeling[s]... I don't see why full marks wouldn't be rewarded." This here is my personal viewpoint, stated quite well by my anonymous cohort. As with this slogpost, I am going to assume that a prof/TA has read this comment (because feedback is important), and I'm going to assume that this student's opinion is correct, because that reader had no comments to give.
That's my two cents from up on my soapbox. Dear readers, I have a blogroll now (on the right), as well as a wondrously happy comment-box down below. Hint, hint. Seriously, though, this stuff's important. We're not just hackers, hm?

Peace.

--David

Sunday, March 16, 2014

Helper Functions

Three weeks ago I examined one strange topic that occurred while using helper functions. I published the post and thought I was over and done with the subject. But then I came across this post on piazza, I realized that there was territory here which had been approached, but not explained thoroughly.

My question is: what are the different ways of using helper functions, particularly when the helper is recursive?

As an example problem: take a list of strings, stick them all together, in order (recursively). Recall the recursive string-sticker-together algorithm:

def string_together(x: 'list of str') -> str:
if len(x) == 2:
return x[0] + x[1]
else:
return string_together(y[:-1]) + y[-1]

Now, the task: given a list of strings, string them together, then returns a tuple of three copies of that string three times in a single line. Yes, there are going to be easier ways to do this: I'm only considering options that require a recursive helper function. I'll call this function add_strings.

I'm going to investigate options, considering what happens in a shell after calling:

>>> from xx import add_strings


Method one: the module-level helper

The idea here is that the helper is just another function defined in xx. Standard stuff.

def add_strings(y): y = string_together(y) return y,y,y def string_together(x): if len(x) == 2: return x[0] + x[1] else: return string_together(x[:-1]) + x[-1]

The first thing you should know is that a syntax error in string_together will cause the line

from xx import add_strings

to raise SyntaxError. Actually, the existence of any syntax error in the file will cause that import to raise SyntaxError.

So the specific thing you should know about this method is that the line

from xx import add_strings

does not import string_together. That is, if you then try to do anything with string_together, you get a NameError.


Method two: the nested helper

This is very similar to the previous method; the only difference is that the helper is located inside the main function.

def add_strings2(y):
    def string_together2(x):
        if len(x) == 2:
            return x[0] + x[1]
        else:
            return string_together2(y[:-1]) + y[-1]
    y = string_together2(y)
    return y,y,y

This seemed to have exactly the same behaviour as the previous method, so I looked it up. To summarize my research, nested helpers are useful  for functions that return functions, helpful in terms of readability, but otherwise serve no additional purpose. But hey, readability is good.

On the other hand, if you have to run these functions, say, ten million times, the nested helper version is actually a few seconds slower. This is due to (I presume) string_together having to be redefined at each call of the function.

Method three: the "modal helper"

This is a bit harder. Also, there might be a better name for it (but it looks modal to me), and it's about as tried-and true as "Meta-Recursion". Still, it works. The idea is that the function is its own recursive helper, but only on the recursive calls.

def add_strings3(y, helper=False):
    if helper:
        if len(y) == 2:
            return y[0] + y[1]
        else:
            return add_strings3(y[:-1], True) + y[-1]
    else:
        y = add_strings3(y, True)
        return y,y,y     

This has the advantage that there is no internal function that you might not have access to. It's comparable in speed to the module-level helper. Is it more readable? Probably not, but it's simple.


Random slogs of the week!  (explanation here)

C - Average. Probably not worth looking at, unless you want to read all of the slogs.
B - Above Average. One or two neat things. Maybe try it  if you're looking for content on those particular topics.
A - Excellent. Lots on interesting ideas that aren't easy to find elsewhere. Probably worth visiting; I'd expect to find more interesting content here in the future.
D - Below Average. Not reader-friendly. Maybe take a look to make sure you're not making the same mistakes.
F - Fail. Fatally flawed; for example, not having any content. I hope I don't find any of these.


  • I love the writing style. Light and entertaining.
  • It's really a diary, a collection of feelings. The most interesting of those I've come across thus far. Also, diaries are better than checklists.
  • It's really short. I've said it once, I'll say it again, having short posts is not inherently good or bad. Here, however, where the content is not complex, I could do with a bit more.
  • The updates are really sparse. :(
  • Rating: D+   It was just unique enough that I hope he/she'll come back...
csc148 slog by muc
  • A bit of a diary, but not too much
  • Talks about time management, or at least the need for it.
  • Suffers from being more explicit in the problems than the solutions. 
  • Posting not weekly, but frequent enough I'm confident it'll update in the future.
  • He's taking real analysis!? A math major myself, I'm still impressed.
  • Rating: C+   The only flaw is that there's nothing... intriguing about it -- I'm not convinced that you'll thank me for it ("neat things!") if I recommend it to you.
SLOG1 by Rashid Gaziyev
  • Two posts: OOP, and recursion. Man, this person is determined to do the bare minimum. On the other hand, I'm fairly confident that there is to be at least one future post.
  • The first post is a checklist. Literally. It answers questions one-by-one as listed on the slog handout. I would say that he had missed the part about discussing the way the course is being operated -- except that he seems to have addressed that, too. Thorough.
  • The second post is in a completely different (I guess he got feedback) and much better style. He gives an original (I think) example, always good. Beyond that, a lot of fact-stating.
  • Rating: C+   I really am not impressed, but I just can't come up with more justification.
CSC148 SLOG by GR
  • Talks about course material, and goes beyond lecture content. That is what I'm looking for.
  • The author is even clever, at least a few times. Nice.
  • Reading the first posts: the author really needs to learn how to break down their paragraphs. Reading the newer posts: the author did learn that.
  • A couple small grammar errors, inconsistencies, could use cleaning up.
  • A reference to another slog. Good.
  • Long, long posts (damn, I suffer from that, too)
  • Rating: A+
CSC 148 Winter by Wenfeng. J
  • Whoa, look at that template. Unsure whether I like it, but it stands out.
  • Another unhelpful TA comment. I'm going to assume that the post was changed retroactively, because that comment doesn't really seem relevant.
  • A bit of a diary again. The author does a good job of relating everything back to the course, for the most part.
  • Tells us what his difficulties are, doesn't tell us about his approach or resolution. (This is recurring, isn't it. I'm going to need a clever name. How about... "pity-party"? Yeah, I think that gets the message across...)
  • His examples are confusing :( More explanation would be helpful.
  • He seems to only post technical requirement for the specific assigned topics (the complaint being that most of the posts feel very different from the two technical ones).
  • The most recent post is formatted differently from the others. And it's ugly. I wonder if the author realizes.
  • Rating: C+
That's all for now. Until next week, may you use combinations of Google and firsthand experimentation.
As always, I anxiously await any and all comments.

--David

Sunday, March 9, 2014

Test one

Let's start with a shout-out to B over at CSC148H Slog. Well, a few weeks ago I reviewed his slog. To recap, I said I liked what he what trying to do, but it seemed that he was letting the un-fun TA and the echoing silence get him down; this resulted in his posts ceasing to have interesting analogies, instead becoming shorter and shorter. Somehow he seems to have found me (good on you!) and, further, he seems to have taken my advice!

He wrote a longer post, sure. But it's the content that matters. He carried out an interesting example; not much philosophizing that week, but I'll deal. More importantly, he's put out some general statements, something which he had neglected to do previously. He talked about slogs, saying he appreciates seeing slogs with "ideas that we don't encounter in class", saying he has come to "value CSC148H's slogging community"(!), and he actively solicited comments.

And he told me I was being supportive. And then he made this reply to the TA. Wow. I feel good now. I mean, it's nice to know that, when you reach out, you can actually touch someone. This literally made my day. Thank you, brave soul!



OK, I've got a few words to say on the first term test. First of all, it was fairly easy (this is not just subjective -- the average was indeed high). To be perfectly honest, I guessed this would happen.

Second, the cheat aid sheet was not useful. To be honest, I also guessed this would happen. I mean, I did write the thing. I just didn't find I had to refer to it. I asked a few others, and this seemed to be generally the case (if you know of someone who found theirs particularly useful, I'd like to hear why). The majority of my sheet was taken up by examples of how to use unittest, which of course didn't come up at all. Man, unittest is annoying.

I found it interesting that there were four different versions of the test; mainly, surprised that this just never seems to happen elsewhere (when I hear that there are different versions of a test, it without fail means that the multiple choice questions are asked in a different order). Then again, maybe this was necessary, given the lack of space in the room, so that attempting to cheat would be unhelpful. Anyways, the professor (Heap) mentioned that he would magically adjust the marks so that each version had the same average. Liking stats, I asked him about this. He seemed, um, rather cynical when I asked this, his eyes shifting as he asked whether this was a complaint. He told me that the variance was very low, particularly for the first question (on tracing a recursive function -- the low variance is probably due to particularly high marks overall on the question, but that's just my conjecture). That's not much to go on. But there you go, I guess that's how the course works. If I were a prof, I would totally make this kind of information publicly available, and I would be excited to find that [a] student cares about it.


Random slogs of the week!  (explanation here)

C - Average. Probably not worth looking at, unless you want to read all of the slogs.
B - Above Average. One or two neat things. Maybe try it  if you're looking for content on those particular topics.
A - Excellent. Lots on interesting ideas that aren't easy to find elsewhere. Probably worth visiting; I'd expect to find more interesting content here in the future.
D - Below Average. Not reader-friendly. Maybe take a look to make sure you're not making the same mistakes.
F - Fail. Fatally flawed; for example, not having any content. I hope I don't find any of these.

CSC148 SLOG Winter 2014 by Elizabeth
  • Pretty! The text seems carefully groomed.
  • The posts are short, which is more because the author doesn't ramble than anything.
  • The tone is cheery and rather a pleasure to read.
  • She talks about stuff that's strictly course content... for a sentence or two, and then provides her own angles on it. That, I think, is exactly what they expect us to do (why have so few people been doing it??)
  • Rating: B+   Solid, but not outstanding; I feel like there's some unreached potential here, that maybe the next post would be even better. I think i'm going to check back, just in case.
CSC148 by Erica M.
  • Very much a diary. Normally, I shrug off this stuff as checklisting -- but she's so thorough! Also, she does give her opinions, so she is technically following the slog guidelines.
  • There's a font change halfway down the page. I can understand why, the first font is really awful. Personally, I'd recommend changing those retroactively.
  • Rating: B-   I suppose that if you don't have a basic understanding of the core course concepts, this might actually be a good place to dive in. It really is thorough.
csc148 SLOG by Labiba Chowdhury
  • Lots of reflection going on here.
  • But not much posting.
  • Grade: D   The only post was okay at best.
CSC148 SLOG by Ramsey
  • Lots of well-documented confusion and complaints. 
  • Unfortunately few notes of resolution.
  • Raises a few interesting questions (leaving them open)
  • Short posts usually neither good nor bad, but here, definitely an asset.
  • A hidden reference or two (or maybe a ton, that I'm missing)
  • The author dislikes unittest, too =p
  • I like the formatting.
  • Rating: B   It's a light read, so no harm in trying it.
  • Reads like lecture notes.
  • Tiny font :(
  • Only one post :( :(
  • Rating: D   Not much to say here.

Well, that's all for today. I've finally exhausted my backlog of topics to write about, and I'm looking for more. 

O mysterious readers, please comment, lest I be forced to shout alone into the void. <3
Or you could slog about me. That's cool, too... =p

--David

Sunday, March 2, 2014

Even more slogs

So as I alluded to last week, my friend over here gave me idea. You guys are all writing slogs of your own, supposedly about things that might be interesting to me, but as he/she said, "what about when I ... want to find out what others have said [about a specific topic]?"

Well, the answer of course, is to write another script. I mean, I already know how to get python to read page-source. And I already have your slog addresses, all of them. So what am I waiting for? Break out the urllib!

slogdict = {}

slogfile = open("Slogcontent.txt", "w")

for slog in urls:
    print(slog)
    try:
        x = urllib.request.urlopen(slog)
        lines = x.readlines()
        lines = [str(i) for i in lines]
        slogdict[slog] = lines
    except:
        print("FAILED: {}".format(slog))

slogfile.write(str(slogdict))
slogfile.close()

Yeah. Well, now I have all of your slogs.

Well, almost. There are a few of you whose slogs don't seem to work. If so, I don't have yours (but you might not either, so I guess we're even).

I'm also missing some content, for those of you who have blogs for which not all content is visible from the first page, for example, this beautiful creation. Oh well.

Anyways, I've now created this dictionary, one that associates everyone's slog with its contents, and written it to a file. I can read this file, get the string, and then convert it back into dictionary! Right? Right? Well, look at this simple example:

>>> x
'{1:11,2:222,3:3333}'
>>> dict(x)
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    dict(x)
ValueError: dictionary update sequence element #0 has length 1; 2 is required

Oh.

So, google being my friend, I went to find the fastest solution I could.
Here is what I found:

>>> from ast import literal_eval as ale
>>> ale(x)
{1: 11, 2: 222, 3: 3333}

As usual, there's probably a better way, but hey, it works. And as far as I'm concerned, there are harder ways.

So now I've got the dictionaries. Now I clump them together, and make a dictionary of word frequencies, like we did in 108. Except harder, because I'm trying to remove things that aren't actually content, all the "WebFontConfig" and "<title/>" and stuff. And while I managed to parse through it, I don't think I did a particularly good job. Some of it had to be done at the end, simply brute-force removing words like "Powered", "By","Solid" and "Normal". Ah well. I won't be sharing it (unless of course you ask nicely, in which case I'd be happy).

One thing I've done is to implement a simple search engine. I type in a string, and a list of all of the blogs which contain that string are returned to me. Cool, eh? All this power! I wonder what I should do with it...

Well, I now present the result of automated frequency analysis, the answer to the question, "What is it that my classmates are talking about?" Here is the list of, from most- to least-used, the top fifty most-blogged words:

['THE', 'TO', 'A', 'OF', 'I', 'IN', 'IS', 'THAT', 'IT', 'THIS', 'WE', 'FOR', 'WAS', 'AS', 'WITH', 'MY', 'BE', 'ON', 'CAN', 'HAVE', 'BUT', 'ARE', 'RECURSION', 'YOU', 'AN', 'ABOUT', 'CLASS', 'MORE', 'SO', 'WHICH', 'HOW', 'PROGRAMMING', 'WHEN', 'WHAT', 'FROM', 'ONE', 'AT', 'WILL', 'CODE', 'ME', 'FIRST', 'OR', 'ALL', 'SOME', 'WEEK', 'WOULD', 'RECURSIVE', 'LIKE', 'OBJECT', 'SLOG']

Hm. I don't know about you guys, but I found that unenlightening. Maybe I need a better question...

Life's making me a little busy (this code that I talked about here? I wrote it back during reading week). I'm taking a week off from random slogs. Until next week, I really am looking for questions to answer with my search engine, so if you have some, just give me a shout.

--David

Sunday, February 23, 2014

"Meta-recursion"

Far too many times during Assignment 1, I found myself wishing I could make just a tiny change to GUIController, knowing that I wasn't allowed to, and getting all hung up on that. I realize that, in the real world, there will certainly be times in which I would have to build something to work with someone else's untouchable piece of code. But the thing that irks me is that the scripts I had to write were needed to make the GUI functional, and yet this piece of framework for the untouchable-and-perfect GUI was not documented in what it was supposed to do. But enough about that assignment, that's long gone.


So I learned something interesting about recursion, by way of happening upon something confusing in my lab. Because I like super-simple examples, imagine the following code:

def add_strings(x:list of str) -> str:
""" Takes a list of strings, and sticks them together.
Recursively.
"""
if len(x) == 2:
return x[0] + x[1]
else:
return stick_together(x)

def stick_together(y: list of str) -> str:
return add_strings(y[:-1]) + y[-1]

Okay, before some of you get mad, let me just say that I chose a simple example for a reason.

For the rest of you: what is going on here? One function calls another, which calls the first one. Could this be, *gasp*, meta-recursion???
Let's break it down. Consider this function:

def add_strings2(x: 'list of str') -> str:
if len(x) == 2:
return x[0] + x[1]
else:
return add_strings2(x[:-1]) + x[-1]

Well that was easy. But see, these two are the same function. The secret is that stick_together is a helper function -- one that contains the recursive call. This was really not obvious to me the first time I encountered it, because the functions were a lot more complicated (after a good deal of thought, we were able to make it into a single function). I can't decide whether there's a good reason to know this or not, but now, it just seems trivial.


Ok another recursion question: when do you NOT want to use it?

Let's use the recursive list-summing function as an example:

def r_sum(nested_num_list):
    tot = 0
    for element in nested_num_list:
        if type(element) == type([]):
            tot += r_sum(element)
        else:
            tot += element
    return tot
[Stolen from this page linked to from the course website]

The obvious answer is that you don't need recursion when a simple for-loop will do. If you know that your input will never be nested, then writing this function is far to complex for what you really need to do (hint: try the builtin sum()).

I can think of another reason, though:

>>> mylist = [0]
>>> mylist += mylist,
>>> mylist
[0, [...]]
>>> mylist[1]
[0, [...]]
>>> mylist[1] is mylist
True

This is a list containing a reference to itself. mylist is [0,mylist]. The sum of these elements has got to be zero, right? Because it contains no non-zero numbers? Well, try it.

>>> r_sum(mylist)

Yeah. This problem is a little too big for recursion. Sorry.


RANDOM SLOGS OF THE WEEK! (explanation here)
The unambitious marking scheme:
C - Average. Probably not worth looking at, unless you want to read all of the slogs.
B - Above Average. One or two neat things. Maybe try it  if you're looking for content on those particular topics.
A - Excellent. Lots on interesting ideas that aren't easy to find elsewhere. Probably worth visiting; I'd expect to find more interesting content here in the future.
D - Below Average. Not reader-friendly. Maybe take a look to make sure you're not making the same mistakes.
F - Fail. Fatally flawed; for example, not having any content. I hope I don't find any of these.

Third Week by VVXiyang Rain
  • Well, there's only one post. Strike one.
  • The post starts with "OK so this is the first time that I write my slog..." Helpful.
  • And it ends with "Hope things will get better." Well, so do I.
  • Actually the middle is pretty odd as well, it contains ambiguous statements such as "But sadly I did worst in Class in 108". 
  • Seriously though, this is very "I did this today"-style. Like going down a checklist: I did this, then this, then this, etc. Nothing exciting.
  • There are no spaces between sentences. Anywhere. Seriously?
  • The title. "Third Week". Is that a reference to the fact that he hasn't used it since the third week of the course, the date of the only posting commemorated forever in the url? I mean, I can only hope it's not a Nazi Germany pun. 
  • Rating: D-    I've had enough.
  • Neil brings a very out-of-field approach to computer science. For those of you who are struggling to find interesting things to write about, I bet you could take a few pointers from this guy.
  • I like his writing style. :)
  • Rating: B+   And a good one at that. This could easily be an A.
CSC148 SLOG by TheCoder
  • Not a lot of content (two posts overall)
  • TheCoder does a lot of reiterating concepts. However, the author does a good job of being concise and relevant.
  • TheCoder does a lot of train-of-thought checklisting, another thing which is generally annoying. Again, the author does a good job of being concise and relevant with it, keeping it in check.
  • Rating: C+   I get the impression that this person could write a good blog, but doesn't want to do one on this topic.
  • Snazzy! The blog looks good. And that is definitely a plus.
  • The content is the most checklist-y I think I've encountered so far. The most reflective comment I can find is "It looks cool".
  • A general note: don't say that you hope you will be able to solve a problem -- and then never tell us. The interesting part comes in hearing how the difficult thing was solved.
  • Another note: you don't need to tell us that you find something confusing. That's really not interesting or meaningful to read about. But the things is, I'm pretty sure that I've done that myself, here. So I'll leave that as a takeaway for everyone, and try not to hold it against the author.
  • The subtitle says that he/she dedicates the blog to keep track of, among other things, "how I overcome every problems [sic]". Typo aside: the blog simply doesn't do that. It talks about lots of problems, but almost no solutions. I disapprove of lying to your readers right at the top of the page. 
  • Rating: C-
CSC148 by Angel
  • No content. Not even hummingbirds can save you from that.
  • Rating: F

If you have questions about my stuff, hate me because I insulted your blog, or just fancy a chat, feel free to comment.

--David

Monday, February 17, 2014

More and more slogs

Morning all!

I'd like to start with a shout-out to spinningmind over at Computer Science Journal for putting in some favourable words for me. Further, he/she discovered my blog while writing a post, which he apparently published the day before I published the page of mine to which he linked. Impressively precocious! Things that this person likes include talking about slogging, linking to other slogs, browsing "a random sample of the SLOGs listed", and actually having interesting content, all of which are quite commendable.

See, I've actually learned something from this person. Like me, spinningmind started by asking themselves a question about slogs: in this case, "what about when I'm writing on a specific topic and want to find out what others have said about it?" I never thought about that. That's an interesting question. I would actually like to be able to tell you what everyone is writing about. That wouldn't be that hard to answer, would it? I'm gonna come back to this one...

Anyways, here's what I'm doing. As outlined here, I'm going to generate a random sample of the slogs to visit. As outlined there, I want to spend my time here writing about slogs, because that's something you should all be interested in; further, talking about the slogs of specific, random other people has got to be socially acceptable, because the profs have more-or-less decreed it so (as they expect us to find slogs to talk about from the listing). Well, it's about time I actually got around to doing any of that. Onward!

I'm going to review random slogs. And I'm going to make this a thing, do it all the time. Just for you guys, of course. I'm not a grader or an expert, so I won't be doing anything fancy. I want to write a few sentences on each. About anything that would make it notable compared to the hundreds of slogs out there, or anything I dislike that you would probably want to fix if you were the author (I'm going to be picky). Finally, I will rank them, so that you all know where they stand, and which ones are most worth your time. I'm hoping to learn what it is that people write about, and what it is that makes good slogs good and bad slogs not-good. And that should be relevant to all of you -- and to me, too.

An unambitious marking scheme:
C - Average. Probably not worth looking at, unless you want to read all of the slogs.
B - Above Average. One or two neat things. Maybe try it  if you're looking for content on those particular topics.
A - Excellent. Lots on interesting ideas that aren't easy to find elsewhere. Probably worth visiting; I'd expect to find more interesting content here in the future.
D - Below Average. Not reader-friendly. Maybe take a look to make sure you're not making the same mistakes.
F - Fail. Fatally flawed; for example, not having any content. I hope I don't find any of these.

What am I waiting for? Break out the script!

AlexT.148Log by Solo.Player.
  • Short entries (unsure whether or not this is a good thing).
  • A focus on class content, lots of suggestions on how to get more comfortable with it. 
  • Lots of records of personal thoughts, like a train-of-thought to-do list; reads a bit like a diary. 
  • Changed font for his most recent entry for no apparent reason (try to keep it consistent).
  • Rating: C+

CSC148H Slog by B
  • Draws really cool correlations between course concepts and philosophical concepts.
  • The first post in particular presents some legitimately interesting ideas that have enough depth to them that he probably could have written much much more on the subject had he chosen to flesh it out.
  • Received a poorly-thought-out TA comment on that post, recommending that he write posts with fewer tangents rather than more ideas. B seems to have taken that to heart, the posts get shorter and shorter, with the most recent being only a single paragraph. I personally am of the viewpoint that sufficiently interesting tangents are nothing but bonus material.
  • The blog is made up of well-structured paragraphs. Given this, I'm surprised that B even managed to write something that "do not aid this goal", as the TA said. Maybe pay a little more attention here.
  • The text is a little content-dense for a blog (long sentences, little to no filler). I'd recommend splitting up paragraphs into smaller pieces.
  • Rating: A-     --I'll be checking back.
CSC148 sLOG by godough
  • A single paragraph of content. Huh.
  • At least, a straightforward, clean writing style.
  • The paragraph is far too big, longer than any I've written here. He could have at least pretended to have more.
  • Rating: D
CSC148 Winter sLOG by Tailong He
  • Really short entries. Honestly, that's neither good nor bad.
  • Talks about his labs.
  • Reads like a checklist. Today he did A, B, C. Last week, D, E, and F. Nothing too exciting.
  • Rating: C
  • "Nothing Found". Oh dear. I guess he/she really can't carry on. Classy.
  • Rating: F

That's all for today, folks. I'd love to hear from you in the comments :)

--David

Sunday, February 9, 2014

SLOGs (bonus: urllib tutorial)

Alright. I'm gonna maybe rant a bit, but I've got some good stuff for you guys, I promise. (I suppose you can just skip down if you really must...)

This course is challenging us to do exactly what I'm doing here, to blog about our experience with the course. And exactly that: we're not here to blog about computer science, as we're not qualified. We're here to write about what we are qualified to write about. And so while we happily go about ourselves, writing about our own personal experiences with our own personal introductions to the world, at some point it occurs to us that the introductory-computer-science-experience blog is kinda niche, as far as blogs go. Who reads stuff like that? (Who's my audience? I care about that.) People who've always wondered what it's like to take a computer science course, but never have? Or maybe people who are currently taking an introductory computer science course, and are looking for emotional validation?

But wait -- there's more! Have you seen this page? Man, there are a heck of a lot of people blogging about the exact same thing as me. Surely there can't possibly be enough people interested in these that they'll all get read...

Oh, right. "...your peers in this course will read your journal, and compare your experience to theirs". Okay. So you guys are pretty much all sloggers, just like me. I can deal with that. Sweet.

Wait. We're supposed to find other slogs... by using the list of slogs. So we're going to anonymously refer to each others' expositions of emotional validation. That sounds awkward. No, it can't be, that's the idea. Right?

Alright. New question. I don't want to read all the slogs. There are far, far too many. But how do I pick? You know, I have a strong, strong suspicion that the most referenced slogs will be the ones at the top of the [alphabetical] list. Let me do a quick check... none of the top four have comments. Hmmm... Okay, maybe the ones with clever names will be popular.. checking... nope, I guess that isn't true either. Alrighty, or maybe there just isn't a lot of commenting going on (so maybe those sites are actually the most visited, but I don't know how to check that). I hear procrastinating is a thing.

Well, never fear! Now I know my audience! You're all looking for slogs to comment on! Aha! Aren't I clever?

No, seriously, let's give you some. But no bias. I'm not going to just pick the ones at the top. Every so often I'm going to go through that giant list, pick a few at random, then evaluate them. That sounds a bit tedious... so let's automate it.

import urllib.request

Now, as you already know, I'm not an expert on this, and I'm going to cheat a bit maybe, but I want to show you that python can indeed access webpages. This is how you can do it. This works.

slogsite = "http://www.cdf.toronto.edu/~heap/148/W14/slog.html"
x = urllib.request.urlopen(slogsite)
lines = x.readlines()

And there you have it. Now you have the information you need.
You'll notice that the elements of lines are of type 'bytes'. Now, there's probably a good way to deal with that in python, but let's not go in deeper than we have to.

lines = [str(i) for i in lines]

After some careful inspection, I see that the slog urls start on the line containing "The SLOGs (courSe bLOGs) below are by students in Winter 2014 CSC148". Also, there are exactly two lines after the last url.

for i in range(len(lines)):
    if "The SLOGs (courSe bLOGs) below" in lines[i]:
        place = i
        break

lines = lines[place:-2]

Now, I really only want the urls. Again, I'm sure there's a really good way to do this. But hey, who doesn't love the good old brute-force method?

urls = ()

for line in lines:
    start = line.index("href=")
    end = line.index(">http")
    urls += line[start + 6: end - 1],

Beautiful, isn't it? Not really. But at least it works. At least until the instructors decide to reformat their website. (But they made the site automatedly, right? I mean, surely they didn't ask a grad student to type in all 518 or so urls...) 
And finally, the random part.

import random

You know the drill. However, for my own nefarious purposes, I want some user-specified number of urls, not just one. So I'll ask.

k = input("How many urls would you like?\nEnter the number '0', or a natural number between 1 and {}, please. ".format(str(len(lines))))

Now I use the randomness. Also, error-checking. 

try:
    random_urls = random.sample(urls,int(k))
    
except ValueError:
    print("http://en.wikipedia.org/wiki/Natural_number")
    random_urls = []

And now I print.

for i in random_urls:
    print(i)


And there you go. Running it gives you one or more slog urls, all ready for copy-pasting into your browser.

Man, I have more to say on the subject still. Like, you know, actually looking at some of these slogs, and talking about them. I will soon, I promise. But I've written a ton, and the code snippet inserts make it look even longer. Next time ;)

Thanks for bearing with me. If you have a comment, please please please comment. Alternatively, if you really are looking for emotional validation, I suppose I can try to help with that, too.

-- David